Re: multicollinearity in regression
- From: "Reef Fish" <Large_Nassau_Grouper@xxxxxxxxx>
- Date: 27 Mar 2006 08:35:32 -0800
Greg Heath wrote:
Anon. wrote:
Paul wrote:
Hi,As Reef Fish is being his normal helpful self, I'll try and make some
After reading numerous articles on the web, I have a couple of
questions.
sensible suggestions.
1. How do I enter 5 continuous control variables such as LogSize inI don't know what package you're using to do the regression, but you can
regression?
Note I also have 2 continuous independent variables which are not
control variables but on which hypotheses are based. I also have 3
categorical independents which have 2 categories on which hypotheses
are based.
I could use Analysis of Covariance but 2 of the independent variables
are also continuous.
Do I just enter all the variables as Independents into the regression
using Enter method (which is what I have done)?
simply fit the model as a multiple regression with all of the variables in.
2. The multicollinearity diagnostics indicate that the highest VIF isI must admit that I don't totally understand this, but I assume that
2.3 which is less that the rule of thumb value of 4 or 2.5 that I have
seen mentioned. However, the Condition Index is 67.899 and seems to be
related to LOGSIZE variable which has a Variance Proportion of .99. The
other variable with a high Variance Proportion (.99) is the Constant.
this is suggesting that LOGSIZE is co-linear with another variable (or a
combination of variables). It may be that you can find out what's going
on by making pairwise plots of the covariates, and this will guide you
to seeing what to do.
If I remove the variable LOGSIZE, then the coefficient of the ConstantThe change in the coefficient of the Constant isn't surprising,
is reduced from -33 to
-.44. None of the signs of the other coefficients are changed although
one Independent variable is now significant which wasn't previously.
especially if some of the covariates are distrbuted a long way from zero.
I'm guessing that in the model with LOGSIZE, the LOGSIZE coefficient is
pretty small, true? Oh, and the independent variable that has become
significant: how much did the coefficient change? It's possible that it
only moved a bit, from being just non-significant to being jusy significant.
So should I just report the 2 regression models, one with LOGSIZEI think you should try and understand why there seems to be
included and one with it excluded?
multicollinearity: it may be that you can then see a sensible approach
(i.e. one based on the substansive problem, not just a set of numbers).
When I get problems like these, I try to report one analysis, and make
a comment along the lines of "...if we include factor X, we get similar
results"-.
There are people on this list who have a much better understanding of
multicollinearity than I do, so hopefully they'll chime in with some
sensible advice as well.
I always find it helpful to calculate the correlation coefficient
matrix of all variables. This will give you pairwise correlation
information which usually helps to explain most problems with
multicollinearity.
This is patently FALSE, and had been debunked numerous times
in sci.stat.math. "Linear dependence" is an notion in LINEAR
ALBEBRA, whose definition does NOT depend on any notion of
"correlations". In that respect, correlations are completely
USELESS (except the case r = 1.000000) in diagnosing
multicollinearity problems.
Additional insight, if needed, can be obtained
from pairwise scatter plots. For example, if x2, x4 and x6 are
significantly correlated it sometimes helps to plot x4 and x6
vs x2.
You would only be waiting the time and resources of pairwise
scatter plots.
Eigenvalue and eigenvector analysis of the X's is the only way
to sort out and understand the underlying multicollinerity.
It's all DEJA VU.
Use the google archives and keywords to find what you missed.
in sci.stat.math, since March 2005.
--- Bob.
Hope this helps.
Greg
correlation
.
- Follow-Ups:
- Re: multicollinearity in regression
- From: Greg Heath
- Re: multicollinearity in regression
- References:
- multicollinearity in regression
- From: Paul
- Re: multicollinearity in regression
- From: Anon.
- Re: multicollinearity in regression
- From: Greg Heath
- multicollinearity in regression
- Prev by Date: Re: multicollinearity in regression
- Next by Date: Re: multicollinearity in regression
- Previous by thread: Re: multicollinearity in regression
- Next by thread: Re: multicollinearity in regression
- Index(es):
Relevant Pages
|