Re: Interpreting results of logistic regression



On Wed, 24 Dec 2008 14:13:35 +0100, Eric Wajnberg
<wajnberg@xxxxxxxxxxxxxx> wrote:

Dear all,

I have still some problems in interpreting the results of a logistic
regression.

Here is the dataset that I have:

1 P 2 19
2 P 1 19
3 P 1 19
4 P 1 19
5 P 2 19
6 P 2 19
7 P 1 19
8 P 3 19
9 P 6 19
10 P 3 19
11 P 3 19
12 P 2 18
13 P 4 17
14 P 4 17
15 P 7 16
16 P 4 15
17 P 3 14
18 P 4 13
19 P 3 13
20 P 3 13
21 P 3 11
22 P 3 8
23 P 3 8
1 S 0 19
2 S 0 19
3 S 0 19
4 S 0 19
5 S 0 19
6 S 0 19
7 S 0 19
8 S 0 19
9 S 1 19
10 S 1 19
11 S 0 19
12 S 2 19
13 S 2 17
14 S 0 16
15 S 0 16
16 S 0 16
17 S 0 15
18 S 0 13
19 S 0 13
20 S 0 12
21 S 1 11
22 S 1 8
23 S 0 8


The first column is indicating different days, there are two treatments
"P" and "S", and the last two column are number of failure observed and
the total number of trials. The outcome is binomially distributed.

I am interesting in testing the day effect, the treatment effect and the
interaction between these two effects on the rate of failure observed.

Parameter values after fitting a logistic regression are the following ones:

Parameter df estimate SE Chi2 P-value
Intercept 1 -4.9021 0.7993 37.61 <.0001
days 1 0.0878 0.0527 2.78 0.0956
treatment P 1 2.3291 0.8486 7.53 0.0061
days*treatm P 1 0.0034 0.0563 0.00 0.9515
Scale 0 0.8890 0.0000 0.8890 0.8890

Parameters for treatment = "S" are -4.9 and 0.088, so, at day 0, the
estimate rate of failure is 1/(1+exp(4.9))=0.0075 (logit link function),
which appears to be strongly different from zero. This is the first
result that surprises me, especially if you look at a plot of rates of
failure as a function of days for the two treatments. Also, the
days*treatment interaction is not significant, indicating that the two
slopes are statistically the same. This is also hard to beleive
if you look at the same plot..

"Is there thus something important I am missing in interpreting the
results of the logistic regression?" - Eric asks at the end.

An important oversight may be that you can't actually take a logit
of zero. An implicit increase from nearly-zero to 1 is probably
larger than a measured increase from 1.5 to 4.0.

I look at the data and I see a time-increase for group S, where there
are only 1 or 2 failures up through day 7, and 3+ thereafter.
(And I wonder if the counts are really binomial and independent,
because of their over-consistency).

I look at group P and see a "similar" increase: all counts are 0
through day 8, with the only events happening thereafter.

It is hard to say which group has the greater slope across time,
since 0 is undefined and is approached on the log-scale for small
values of the logit.


Now, I've tried to fit a simple linear models to these data after
arcsin-transformation, weighting each observation with the number of
trials. Here are the parameter values obtained after the fit:

Parameter estimate SE t Value P-value
Intercept 0.0145301435 0.04664049 0.31 0.7569
days 0.0054802539 0.00377129 1.45 0.1536
treatment 0.2159431477 0.06594499 3.27 0.0021
days*treatmt 0.0127051385 0.00533082 2.38 0.0218

In this case, the intercept is not significant (failure rate at day 0
for treatment "S") and the test of the interaction between days and
treatment indicates that the slopes for the two treatments significantly
differ, and all this looks much more in accordance to what can be seen
on the plot.

When I once tested the use of the t-test with the arc-sin
transformation, to see if that should be in my arsenal of tools,
I concluded that it was a bad transformation to use for testing
for proportions under 10%. Since the only argument for the
arc-sin (that I recall, anyway) is that it stabilizes variance for
simple ANOVA testing, it is not a good basis for examining
interactions, even it wasn't a lousy approximation for the data
on hand. That is, it makes no pretense (I think) of offering an
"appropriate" metric for comparing trends.

On the other hand, by doing *something* with the values near zero,
the arc-sin is probably better than using the simple additive model.


The main logic for using the logit is the Growth curve, which it
models naturally. IIRC, logit arises from other phenomena, too.
There are phenomena where the probit is more precise as a
technical model, if you are faced with modeling the extremes.

However, neither probit nor logit will do well with zero, and you
have one group that is far too close to zero.


Is there thus something important I am missing in interpreting the
results of the logistic regression?

Thanks for any help on that.

I think I believe the logit results, namely, that you don't have
useful evidence of different slopes; given that it is testing
proportionality (n-fold increase), and given that the data counts
are so few. There are only 8 events in the second group -- 2 events
at the midpoint ("12"); 2 events shortly before, 2 shortly after;
and 2 that are long after.


--
Rich Ulrich
.



Relevant Pages

  • Re: mulitivariate linear regression and Bonferroni
    ... social science settings, this sounds completely unrealistic. ... multiple comparisons developed for experimental designs ... Should this be done with the coefficients of a single regression? ... comparing each of k-1 treatments to a control. ...
    (sci.stat.edu)
  • combining logit estimates
    ... I'm conducting an HLM piecewise regression growth model using MIXOR. ... MIXOR uses a logit link function, and the printouts look a lot like regular logistic regressions. ... Each item in the regression has associated estimates expressed as logits, a standard error for this logit, and a Z-score obtained by dividing the estimate by the standard error. ...
    (sci.stat.consult)
  • Re: Question About Log-Likelihood
    ... Logit=B1*X, where B1 is the coefficient of X, and Logit is the Logit ... know how to get a single value of log-likelihood in R (logLik ... by passing in the logit regression. ... Someplace in there you should find LL defined as a summation.) ...
    (sci.stat.math)
  • Re: Question About Log-Likelihood
    ... Logit=B1*X, where B1 is the coefficient of X, and Logit is the Logit ... the question asks me to plot "Log-Likelihood versus B1". ... by passing in the logit regression. ...
    (sci.stat.math)