Re: multicollinearity in regression



Paul wrote:
Hi,

After reading numerous articles on the web, I have a couple of
questions.

As Reef Fish is being his normal helpful self, I'll try and make some sensible suggestions.

1. How do I enter 5 continuous control variables such as LogSize in
regression?
Note I also have 2 continuous independent variables which are not
control variables but on which hypotheses are based. I also have 3
categorical independents which have 2 categories on which hypotheses
are based.
I could use Analysis of Covariance but 2 of the independent variables
are also continuous.
Do I just enter all the variables as Independents into the regression
using Enter method (which is what I have done)?

I don't know what package you're using to do the regression, but you can simply fit the model as a multiple regression with all of the variables in.

2. The multicollinearity diagnostics indicate that the highest VIF is
2.3 which is less that the rule of thumb value of 4 or 2.5 that I have
seen mentioned. However, the Condition Index is 67.899 and seems to be
related to LOGSIZE variable which has a Variance Proportion of .99. The
other variable with a high Variance Proportion (.99) is the Constant.

I must admit that I don't totally understand this, but I assume that this is suggesting that LOGSIZE is co-linear with another variable (or a combination of variables). It may be that you can find out what's going on by making pairwise plots of the covariates, and this will guide you to seeing what to do.

If I remove the variable LOGSIZE, then the coefficient of the Constant
is reduced from -33 to
-.44. None of the signs of the other coefficients are changed although
one Independent variable is now significant which wasn't previously.

The change in the coefficient of the Constant isn't surprising, especially if some of the covariates are distrbuted a long way from zero.

I'm guessing that in the model with LOGSIZE, the LOGSIZE coefficient is pretty small, true? Oh, and the independent variable that has become significant: how much did the coefficient change? It's possible that it only moved a bit, from being just non-significant to being jusy significant.

So should I just report the 2 regression models, one with LOGSIZE
included and one with it excluded?

I think you should try and understand why there seems to be multicollinearity: it may be that you can then see a sensible approach (i.e. one based on the substansive problem, not just a set of numbers). When I get problems like these, I try to report one analysis, and make a comment along the lines of "...if we include factor X, we get similar results"-.

There are people on this list who have a much better understanding of multicollinearity than I do, so hopefully they'll chime in with some sensible advice as well.

Bob

--
Bob O'Hara
Department of Mathematics and Statistics
P.O. Box 68 (Gustaf Hällströmin katu 2b)
FIN-00014 University of Helsinki
Finland

Telephone: +358-9-191 51479
Mobile: +358 50 599 0540
Fax: +358-9-191 51400
WWW: http://www.RNI.Helsinki.FI/~boh/
Journal of Negative Results - EEB: www.jnr-eeb.org
.



Relevant Pages

  • Re: multicollinearity in regression
    ... I could use Analysis of Covariance but 2 of the independent variables ... related to LOGSIZE variable which has a Variance Proportion of .99. ... I'm guessing that in the model with LOGSIZE, the LOGSIZE coefficient is ... I always find it helpful to calculate the correlation coefficient ...
    (sci.stat.consult)
  • multicollinearity in regression
    ... control variables but on which hypotheses are based. ... I could use Analysis of Covariance but 2 of the independent variables ... Do I just enter all the variables as Independents into the regression ... related to LOGSIZE variable which has a Variance Proportion of .99. ...
    (sci.stat.consult)
  • Re: Multiple Regression w/ Polynomial-in-Y?
    ... Solutions of Multivariate Regression. ... Again one seems to get the dependent variable "y" on both sides of the ... Yields w-vector and c coefficient ... > terms of independent variables (which I suspect would not be so good ...
    (sci.math.num-analysis)
  • Re: Multiple Regression w/ Polynomial-in-Y?
    ... Solutions of Multivariate Regression. ... Again one seems to get the dependent variable "y" on both sides of the ... Yields w-vector and c coefficient ... > terms of independent variables (which I suspect would not be so good ...
    (sci.stat.math)
  • Re: The latest bike lane research
    ... Stepwise regression indication (Pg. ... "Separate regression models used these independent variables to explain ... Earlier they state that their paid cyclists " ... recreationalist is 70% of trips are for recreation. ...
    (rec.bicycles.misc)