Re: distribution comparison
- From: "Viktor Martyanov" <vmartyan@xxxxxxxxxxxxxxxx>
- Date: Mon, 7 Aug 2006 14:58:17 -0400
Peter Perkins wrote:
of
Viktor Martyanov wrote:
I am using Distribution Fitting Tool to fit my data to aparticular
distribution. I think I can make some conclusions on the basis
actualvisual results. But how can I quantify differences between
data and theoretical distributions (e.g. by comparing theirwithin
variances)? Not sure how to do that while evaluating the fits
Distribution Fitting Tool.
Viktor, there are measures such as the log-likelihood (which the
tool gives
you), AIC/BIC (which are simple to compute given the LL), or
Kolmogorov
statistics (which you can get from KSTEST). But none of these
scalar statistics
are going to tell you as much as just looking -- there's no way,
for example,
that the AIC can tell you that the fit near the mode is right on,
but in the
tails it drops off too fast. You presumably care _how_ a fit
ifails to capture
what your data say, so that you can decide if that aspect is
important or not.
It's unfortunate that there is little general theory to allow you
to test
hypotheses of particular families of distributions -- you can est
against
specific distributions, but not, for example, things like, "do my
data come from
_some_ gamma distribution?" You can do simulations to try to get
at that kind
of question, but it's hard to get the kind of hard p-values you
might be looking
for.
Hope this helps.
- Peter Perkins
The MathWorks, Inc.
Peter,
Thanks a lot for your detailed answer. I guess the problem with
visual analysis in my case is the size of the dataset. Right now I am
looking at the distribution of 256 4-mer DNA motifs in front of each
gene. Therefore, I end up having to visually check 256 sets of
distribution fits if I am using DFITTOOL. If I proceed with 5-mers or
6-mers it becomes impossible to test all fits within reasonable time.
Besides, I am interested in some number that would be indicative of
goodness-of-fit between actual data and theoretical distribution.
Having specific numbers would allow me to select say 10% of n-mers
that are most different from default distribution. I do not think
such selection can be made on the basis of the visual determination.
I also have a couple of questions regarding statistics you are
talking about. As for KSTEST, I can use it only to find out if the
data come from normal distribution or not, is that correct? As for
AIC/BIC, do I use as log(V) (according to Matlab help) the actual
log-likelihood calculated for each distribution?
Thanks a lot in advance.
Viktor
.
- References:
- distribution comparison
- From: Viktor Martyanov
- distribution comparison
- Prev by Date: Re: GUIs and getting callbacks to work
- Next by Date: Re: distorted Microsoft fonts
- Previous by thread: distribution comparison
- Next by thread: distribution comparison
- Index(es):
Relevant Pages
|