Re: multicollinearity in regression
- From: "Reef Fish" <Large_Nassau_Grouper@xxxxxxxxx>
- Date: 27 Mar 2006 20:27:16 -0800
Greg Heath wrote:
Reef Fish wrote:
Greg Heath wrote:-----SNIP
Anon. wrote:
There are people on this list who have a much better understanding of
multicollinearity than I do, so hopefully they'll chime in with some
sensible advice as well.
I always find it helpful to calculate the correlation coefficient
matrix of all variables. This will give you pairwise correlation
information which usually helps to explain most problems with
multicollinearity.
This is patently FALSE, and had been debunked numerous times
in sci.stat.math.
WRONG.
Each of those statements is absolutely true. Quantification
of "usually" and "most" does not imply 100% of the time.
But you missed the point that you were looking at the WRONG
quantification (correlations), when the only CORRECT quantification
are EIGENVALUES that are indicative of "linear dependence" of the
multiple-variables kind.
"Linear dependence" is an notion in LINEAR
ALBEBRA, whose definition does NOT depend on any notion of
"correlations".
Curious reply since I made no such implication to the contrary.
If the definition does NOT depend on correlations, why were you
looking at them and said they were useful, even if they were not
misleading, as MOST of them are, in the cases of real linear
dependence?
Members of a subset of variables are linearly dependent if a
nontrivial linear combination of them is always zero.
Or constant. So, how does that relate to your correlations?
My point is that, in my 40+ years of data analysis and
statistical modelling, I have found that
Making the same errors for 40+ years do not make it right.
1. Most (say > 50% of the time) of my multicollinearity
problems could be mitigated by removing only 1 or 2 dependent
variables.
You meant the independent (predictor) variables, don't you?
That's what *I* had said, when those are the INDEPENDENT
variables causing the linear dependence. And quite often
NONE of them is highly correlated with any of the independent
variables.
2. Perusing the correlation coefficient matrix before modelling
usually (say > 50% of the time) indicated which variables warranted
further investigation.
WRONG again, for using the WRONG indicator of what variables
needed closer scrutiny (that's what you meant).
In that respect, correlations are completely
USELESS (except the case r = 1.000000) in diagnosing
multicollinearity problems.
WRONG. "completely useless" implies 100% of the time.
99.999999% of the time good enough for you? You're arguing
from a childish point of view, completely ignorant of the statistical
substance behind "linear dependence" and its associated
non-mathematical term "multicollinearity" used in statistics.
Additional insight, if needed, can be obtained
from pairwise scatter plots. For example, if x2, x4 and x6 are
significantly correlated it sometimes helps to plot x4 and x6
vs x2.
You would only be waiting the time and resources of pairwise
scatter plots.
I meant "wasting" the time and resources.
I use MATLAB in the interpretive mode. How much time and
resources does it take to type in the command
plot(x(:,2),x(:,4),'b.',x(:,2),x(:,6),'r.')
and then press the reurn key?
Whatever time it took for you to type that AND the wasted time on
the computer AND the wasted paper in the output. Because no
matter what your computer spewed out, you would NOT be a bit
wiser about your misunderstanding of what multicollinearity and
linear dependence are about.
Eigenvalue and eigenvector analysis of the X's is the only way
to sort out and understand the underlying multicollinerity.
WRONG. "only way" implies 100% of the time.
That indeed, is true 100% of the time. It included even the cases
where two X's have correlation 1.000000.
It's all DEJA VU.
Use the google archives and keywords to find what you missed.
in sci.stat.math, since March 2005.
Yes. There is very good stuff there. However, most of what he
missed was senseless arguing over misinterpretations and imprecise
inferences... not recommended for an introduction to the topic. Better
to recommend a good introductory text.
All the textbooks I've used were GOOD introductory texts, provided
the text was TAUGHT by a competent instructor (me) to point the
students to the points overlooked by most, such as yourself, who
may have indeed read some useable introductory text, but MISSED
all of the essential points because you were not properly taught
or your self-taught skills were deficient.
Hope this helps.
Greg
It certainly helped revealing and reconfirming your weaknesses on
the subject. Hope others learn from this abbreviated discussion.
-- Bob.
.
- Follow-Ups:
- Re: multicollinearity in regression
- From: Greg Heath
- Re: multicollinearity in regression
- References:
- multicollinearity in regression
- From: Paul
- Re: multicollinearity in regression
- From: Anon.
- Re: multicollinearity in regression
- From: Greg Heath
- Re: multicollinearity in regression
- From: Reef Fish
- Re: multicollinearity in regression
- From: Greg Heath
- multicollinearity in regression
- Prev by Date: Re: multicollinearity in regression
- Next by Date: Re: multicollinearity in regression
- Previous by thread: Re: multicollinearity in regression
- Next by thread: Re: multicollinearity in regression
- Index(es):
Relevant Pages
|