Re: multicollinearity in regression
- From: "Greg Heath" <heath@xxxxxxxxxxxxxxxx>
- Date: 27 Mar 2006 19:05:30 -0800
Reef Fish wrote:
Greg Heath wrote:-----SNIP
Anon. wrote:
There are people on this list who have a much better understanding of
multicollinearity than I do, so hopefully they'll chime in with some
sensible advice as well.
I always find it helpful to calculate the correlation coefficient
matrix of all variables. This will give you pairwise correlation
information which usually helps to explain most problems with
multicollinearity.
This is patently FALSE, and had been debunked numerous times
in sci.stat.math.
WRONG.
Each of those statements is absolutely true. Quantification
of "usually" and "most" does not imply 100% of the time.
"Linear dependence" is an notion in LINEAR
ALBEBRA, whose definition does NOT depend on any notion of
"correlations".
Curious reply since I made no such implication to the contrary.
Members of a subset of variables are linearly dependent if a
nontrivial linear combination of them is always zero.
My point is that, in my 40+ years of data analysis and
statistical modelling, I have found that
1. Most (say > 50% of the time) of my multicollinearity
problems could be mitigated by removing only 1 or 2 dependent
variables.
2. Perusing the correlation coefficient matrix before modelling
usually (say > 50% of the time) indicated which variables warranted
further investigation.
In that respect, correlations are completely
USELESS (except the case r = 1.000000) in diagnosing
multicollinearity problems.
WRONG. "completely useless" implies 100% of the time.
Additional insight, if needed, can be obtained
from pairwise scatter plots. For example, if x2, x4 and x6 are
significantly correlated it sometimes helps to plot x4 and x6
vs x2.
You would only be waiting the time and resources of pairwise
scatter plots.
I use MATLAB in the interpretive mode. How much time and
resources does it take to type in the command
plot(x(:,2),x(:,4),'b.',x(:,2),x(:,6),'r.')
and then press the reurn key?
Eigenvalue and eigenvector analysis of the X's is the only way
to sort out and understand the underlying multicollinerity.
WRONG. "only way" implies 100% of the time.
It's all DEJA VU.
Use the google archives and keywords to find what you missed.
in sci.stat.math, since March 2005.
Yes. There is very good stuff there. However, most of what he
missed was senseless arguing over misinterpretations and imprecise
inferences... not recommended for an introduction to the topic. Better
to
recommend a good introductory text.
Hope this helps.
Greg
.
- Follow-Ups:
- Re: multicollinearity in regression
- From: Reef Fish
- Re: multicollinearity in regression
- From: Reef Fish
- Re: multicollinearity in regression
- References:
- multicollinearity in regression
- From: Paul
- Re: multicollinearity in regression
- From: Anon.
- Re: multicollinearity in regression
- From: Greg Heath
- Re: multicollinearity in regression
- From: Reef Fish
- multicollinearity in regression
- Prev by Date: Re: multicollinearity in regression
- Next by Date: Re: multicollinearity in regression
- Previous by thread: Re: multicollinearity in regression
- Next by thread: Re: multicollinearity in regression
- Index(es):
Relevant Pages
|