Re: gamfit and zero values
- From: "Tom Lane" <tlane@xxxxxxxxxxxxx>
- Date: Thu, 12 Jul 2007 13:05:56 -0400
"Ben" <junk@xxxxxxxxxxx> wrote in message
news:ef5d399.-1@xxxxxxxxxxxxxxxxxxxxxxxxxx
Can anyone explain why Matlab's gamfit blows up when trying to fit a
data set containing zero values?
Ben, gamfit does maximum likelihood estimation. The likelihood function is
unbounded when there are zeros in the data.
One recommended route is to switch to the method of moments in this case,
and that's what the gamfit function does. Maximum likelihood is often a
good method, but by no means the only one. An advantage of maximum
likelihood is that there are confidence interval calculations based on the
likelihood surface.
In your example, you start with something having a gamma distribution, but
then take the floor of it. You conclude that the results are no longer
accurate. Of course, the truncated values no longer have a gamma
distribution much less the same distribution as the original continuous
data. If you use round() in place of floor(), then the result still
wouldn't have a gamma distribution but you wouldn't be systematically
shifting things down, and the estimates would be a bit closer to the true
values for the original data.
Here's what I might do if I were stuck with integer versions of data that
might inherently have a gamma distribution. First I'd define a function
that took the rounding into account. The function f below computes the
probability of each x value. Then it computes minus their sum on the log
scale to avoid numerical problems and for convenience:
x = round(exprnd(ones(1000,1)));
f = @(p) -sum(log(gamcdf(x+.5,p(1),p(2)) - gamcdf(x-.5,p(1),p(2))));
Then I'd use fminsearch or something better from the Optimization Toolbox to
find the maximum likelihood estimates:
p =p = fminsearch(f,[.5 2])
0.9343 1.1010
This still doesn't give me confidence intervals. To do that I'd have to
examine the likelihood function more carefully. Here's some code to plot
it.
p1 = linspace(.7,1.2,21); p2 = linspace(.8,1.4,19);
for j=1:21, for k=1:19, c(j,k) = f([p1(j),p2(k)]); end, end
contour(p2,p1,c)
And here's something do draw an approximate boundary of a 95% confidence
region computed based on a likelihood ratio statistic:
fmin = f(p)
flr = fmin + chi2inv(.95,2)/2
hold on
[ignore,h] = contour(p2,p1,c,[flr flr]);
set(h,'linestyle',':')
hold off
Some of the details here may be need to be adapted for your real situation,
for example you may need an interval other than [x-0.5,x+0.5] if you're not
rounding to integers. I hope this gives you some ideas, though.
-- Tom
.
- Follow-Ups:
- Re: gamfit and zero values
- From: Ben
- Re: gamfit and zero values
- References:
- gamfit and zero values
- From: Ben
- gamfit and zero values
- Prev by Date: Re: Require a Notepad friendly .txt file output
- Next by Date: Re: running an exe file
- Previous by thread: gamfit and zero values
- Next by thread: Re: gamfit and zero values
- Index(es):
Relevant Pages
|