Re: Anyone have a statistics background?




Schizoid Man wrote:
> Bill Riel wrote:
> > In article <dr41bc$s5b$1@xxxxxxxxxxxxxxxxxxxxx>, schiz@xxxxxx says...
> >
> >>I have a quick question...
> >>
> >>I need a numerical approximation to Student's t distribution that is
> >>accurate to double precision.
> >>
> >>Unless I am completely mistaken, the TDIST() function in Excel behaves
> >>very strangely.
> >>
> >>It cannot calculate a value for x < 0 and the result of this function is
> >>the pdf of the distribution, not the cdf unlike NORMSDIST().
> >>
> >>I realize that I could construct a better function using the BETADIST()
> >>function, but I'm not sure whether this will alleviate or compound
> >>whatever inaccuracies are already present in Excel.
> >>
> >>Currently, the one I am leaning towards is Numerical Recipes.
> >
> >
> > I don't know about Excel - I do recall that in the past that some of the
> > statistical functions were questionable.
> >
> > However, I've used the code from Numerical Recipes and it's worked well
> > - it should meet your precision requirements.
>
> Thanks, Bill. That's exactly what I did - a bit convoluted given that
> you needed a Log Gamma and Incomplete Beta integral in order to get the
> t, but probably worth it given the precision I am striving for.

Watch out with NR algorithms. They have a nasty habit of omitting to
mention where they don't work. I know you are only using it for the
t-distribution but betacf is slow and inaccurate for both large a&b and
for small a&b.

The implementation of gammln only guarantees an absolute error of less
than 2e-10. That means that when you exponentiate it the relative error
in the answer will be 2e-10. This is the best you can claim for your
overall answer for the cdf of the t distribution. I believe the GAMMALN
function in EXCEL2003 uses the same implementation or something very
similar.

While we are on similarities of code, I think EXCEL2003 uses equation
6.4.9 in NR, or similar, to calculate the t-distribution figures in
terms of the beta distribution.

Note, however, that as df gets large, df/(df+t^2) tends to 1 and the
BETADIST calculation in EXCEL terms suffers from the fact that
cancellation errors occur when evaluating 1-(df/(df+t^2)). For large df
and -ve t, it is better to calculate cdf_tdist(t,df) as
comp_cdf_beta(t^2/(t^2+df),1/2,df/2) instead of
cdf_beta(df/(t^2+df),df/2,1/2).

Of course EXCEL does not offer a right tail evaluation for the beta
distribution and so cannot take advantage of this obvious method of
making the calculations more accurate.

Mind you, the NR code does not offer a method of evaluating the right
tail for the beta distribution. So the NR algorithm does not appear to
have any advantages over the EXCEL2003 version.

With the example you gave for NORMDIST, you must be using an earlier
version of EXCEL. In that case, I would strongly recommend you use the
VBA code in http://members.aol.com/iandjmsmith/Examples.txt This does
have code which evaluates the beta distribution for large and small
values of the parameters. Additionally it has separate code for the t
distribution which is more accurate and does not suffer from problems
such as df/(t^2+df) overflowing in the calculation of t^2 or
underflowing if calculated as df/t/(t+df/t). In other words it works.

Ian Smith

.



Relevant Pages

  • Re: Find a distribution!
    ... >> I am working on a project at a company in Sweden where the amount of ... >> bacterias in food are simulated in Excel. ... > The exponential and the gamma distribution I think can give you quit ... If you're looking at pathogens in food it may be that you ...
    (sci.stat.math)
  • Re: Autocreating probability distribution
    ... this functionality is not built into Excel. ... In the case of the Normal distribution, the NORMDIST function takes its mean ... and standard deviation as arguments. ... Excel probability functions are not consistent as to whether they calculate ...
    (microsoft.public.excel)
  • Re: Continuity correction.
    ... Actually for Excel 2003 and 2007, NORMSDIST is accurate from z values ... MIcrosoft viewed Excel as a business tool. ... marketing advantage to add to Excel's stat capabilities. ... The t distribution only as a test, not a visual concept of the ...
    (sci.stat.math)
  • Re: How to generate random X given only min, max, mean?
    ... A uniform distribution from 74 to 76 would meet ... Not that any of the Excel random number generators ... McCullough's tests on the RN generators, ...
    (sci.stat.edu)
  • Re: Random Number Generation Dialog Box
    ... >> Am using Excel 2001 and want to generate some random numbers from a normal ... >> distribution. ... A help search tells me about what looks like a wonder dialog ... > Be aware, however, that the ATP's pseudorandom number generator is not ...
    (microsoft.public.mac.office.excel)