Re: multicollinearity in regression



Reef Fish wrote:
Greg Heath wrote:
Anon. wrote:
-----SNIP
There are people on this list who have a much better understanding of
multicollinearity than I do, so hopefully they'll chime in with some
sensible advice as well.

I always find it helpful to calculate the correlation coefficient
matrix of all variables. This will give you pairwise correlation
information which usually helps to explain most problems with
multicollinearity.

This is patently FALSE, and had been debunked numerous times
in sci.stat.math.

WRONG.

Each of those statements is absolutely true. Quantification
of "usually" and "most" does not imply 100% of the time.

"Linear dependence" is an notion in LINEAR
ALBEBRA, whose definition does NOT depend on any notion of
"correlations".

Curious reply since I made no such implication to the contrary.
Members of a subset of variables are linearly dependent if a
nontrivial linear combination of them is always zero.

My point is that, in my 40+ years of data analysis and
statistical modelling, I have found that

1. Most (say > 50% of the time) of my multicollinearity
problems could be mitigated by removing only 1 or 2 dependent
variables.
2. Perusing the correlation coefficient matrix before modelling
usually (say > 50% of the time) indicated which variables warranted
further investigation.

In that respect, correlations are completely
USELESS (except the case r = 1.000000) in diagnosing
multicollinearity problems.

WRONG. "completely useless" implies 100% of the time.

Additional insight, if needed, can be obtained
from pairwise scatter plots. For example, if x2, x4 and x6 are
significantly correlated it sometimes helps to plot x4 and x6
vs x2.

You would only be waiting the time and resources of pairwise
scatter plots.

I use MATLAB in the interpretive mode. How much time and
resources does it take to type in the command

plot(x(:,2),x(:,4),'b.',x(:,2),x(:,6),'r.')

and then press the reurn key?

Eigenvalue and eigenvector analysis of the X's is the only way
to sort out and understand the underlying multicollinerity.

WRONG. "only way" implies 100% of the time.

It's all DEJA VU.

Use the google archives and keywords to find what you missed.
in sci.stat.math, since March 2005.

Yes. There is very good stuff there. However, most of what he
missed was senseless arguing over misinterpretations and imprecise
inferences... not recommended for an introduction to the topic. Better
to
recommend a good introductory text.

Hope this helps.

Greg

.



Relevant Pages

  • Re: multicollinearity in regression
    ... multicollinearity than I do, so hopefully they'll chime in with some ... I always find it helpful to calculate the correlation coefficient ... from pairwise scatter plots. ... them is indicative of "linear dependence", ...
    (sci.stat.consult)
  • Re: multicollinearity in regression
    ... I could use Analysis of Covariance but 2 of the independent variables ... I'm guessing that in the model with LOGSIZE, the LOGSIZE coefficient is ... multicollinearity: it may be that you can then see a sensible approach ... I always find it helpful to calculate the correlation coefficient ...
    (sci.stat.consult)
  • Re: multicollinearity in regression
    ... multicollinearity than I do, so hopefully they'll chime in with some ... I always find it helpful to calculate the correlation coefficient ... variables causing the linear dependence. ... All the textbooks I've used were GOOD introductory texts, ...
    (sci.stat.consult)
  • Re: multicollinearity in regression
    ... give you MY explanation on multicollinearity. ... X'X is the same as covariance or correlation matrix of X ... The above is partly true (when X'X is nearly singular), ... using PCA (principal component analysis), ...
    (sci.stat.consult)
  • Re: White House spins "The Commander Guy"
    ... Alan Baker wrote: ... Note in particular, these scatter plots: ... both higher *and* lower than the IQ that a single table entry would ... the Raven's score correlation regression line ...
    (rec.sport.golf)