Re: Anderson-darling test for discrete distribution fitting



On 5/2/2007 11:00 AM, David Jones wrote:
Samik R. wrote:
On 5/2/2007 3:47 AM, David Jones wrote:
My other thought on this topic is that you should check that you
have a reasonable formulation for the AD statistic, suited to
discrete distributions and to tied observations. David Jones


Thanks for all the comments. David, can you clarify some more on your
last comment? Currently I am using the conventional form of AD
statistic for both discrete and continuous distributions (the same
one available at NIST:
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35e.htm). Do
you mean I should change this in some manner to accommodate discrete
distributions?
Regards,
-Samik

(i) the AD test as usually done, is essentially a test of whether certain transformed values of the data have a uniform distribution ... where the transformed values are obtained as F(X) for observations X from a distribution having F() for its CDF ... but this only produces a uniform distribution if F is continuous. The usual asymptotic distribution (for no fitted parameters) is derived for this case. This may be one resaon for wanting to do simulations anyway (to overcome this problem).

(ii) there is a question of interpretation for what F and (1-F) should be interpreted in the test statistic in the case of a discrete distribution ... I don't knoe what is usually done. Possibly ...
F = Prob(X<=Xobs) and 1-F=Prob(X>=Xobs)
or
F = Prob(X<=Xobs) and 1-F=Prob(X>Xobs)=Prob(X>=Xobs).
If you do simulations you get results for your particular choice.

(iii) there is also the possibility of revising the weights being used for the separate terms, which are related to the idea of "plotting positions". If you rewrite the usual formulation by separating into two parts, reversing the summation on one and then recombining so that each observation appears in only one terms you can get a better idea for what is going on. Each term is then a function like
w.log y +(1-w) log(1-y)
where w is a weight and y is the probability-point associated with an observation. Allowing y to be free, this is minimised at y=w, so that effectively w is the target for what y should be in the test statistic. For a discrete distribution you might want the target to be 1/N for the lowest observation, rather than 1/(2N).

Hope this helps

David Jones


Thanks for the pointers. I will go ahead and do some simulations as suggested by you and Herman.
Regards,
-Samik
.



Relevant Pages