Re: multicollinearity in regression
- From: "Reef Fish" <Large_Nassau_Grouper@xxxxxxxxx>
- Date: 27 Mar 2006 20:41:14 -0800
Greg Heath wrote:
Reef Fish wrote:
Greg Heath wrote:-----SNIP
Anon. wrote:
There are people on this list who have a much better understanding of
multicollinearity than I do, so hopefully they'll chime in with some
sensible advice as well.
I always find it helpful to calculate the correlation coefficient
matrix of all variables. This will give you pairwise correlation
information which usually helps to explain most problems with
multicollinearity.
This is patently FALSE, and had been debunked numerous times
in sci.stat.math.
WRONG.
I had posted a reply to this but Google seemed to have lost it.
Here's
an abbreviated version. It may appear as a quasi duplicate if google
recovered the one posted half an hour ago.
Each of those statements is absolutely true. Quantification
of "usually" and "most" does not imply 100% of the time.
You missed the point that you were quantifying the WRONG item
(correlation) when the notion and definition of "linear dependence"
in linear ALGEBRA has no correlation content or mention in it.
"Linear dependence" is an notion in LINEAR
ALBEBRA, whose definition does NOT depend on any notion of
"correlations".
Curious reply since I made no such implication to the contrary.
Members of a subset of variables are linearly dependent if a
nontrivial linear combination of them is always zero.
Or constant. So why mention correlation and said it was useful?
My point is that, in my 40+ years of data analysis and
statistical modelling, I have found that
Making errors for 40+ years won't make it right! Better LATE
than NEVER to learn where your errors were!
1. Most (say > 50% of the time) of my multicollinearity
problems could be mitigated by removing only 1 or 2 dependent
variables.
You meant independent variable that are "linearly dependent"
don't you?
2. Perusing the correlation coefficient matrix before modelling
usually (say > 50% of the time) indicated which variables warranted
further investigation.
You would be barking at the wrong tree MOST of the time, including
MISSING the trees when those variables all have LOW correlations,
with each other and with other variables, though perfectly "linearly
dependent" to the point of BLOWING up the regression (for reasons
of a singular X'X matrix).
In that respect, correlations are completely
USELESS (except the case r = 1.000000) in diagnosing
multicollinearity problems.
WRONG. "completely useless" implies 100% of the time.
Would 99.99999% of the time useless make you happy?
Additional insight, if needed, can be obtained
from pairwise scatter plots. For example, if x2, x4 and x6 are
significantly correlated it sometimes helps to plot x4 and x6
vs x2.
You would only be waiting the time and resources of pairwise
scatter plots.
I use MATLAB in the interpretive mode. How much time and
resources does it take to type in the command
plot(x(:,2),x(:,4),'b.',x(:,2),x(:,6),'r.')
and then press the reurn key?
The time of typing those lines; the wasted computer time; and
wasted paper in printing your scatter matrix and plots. NONE of
them is indicative of "linear dependence", so why bother?
Eigenvalue and eigenvector analysis of the X's is the only way
to sort out and understand the underlying multicollinerity.
WRONG. "only way" implies 100% of the time.
It is 100% of the time here, including those X's that have r = 1.00000.
It's all DEJA VU.
Use the google archives and keywords to find what you missed.
in sci.stat.math, since March 2005.
Yes. There is very good stuff there. However, most of what he
missed was senseless arguing over misinterpretations and imprecise
inferences... not recommended for an introduction to the topic. Better
to recommend a good introductory text.
I could recommended the books I've used to teach the subject, but it
would be lacking the competent INSTRUCTOR (myself) to point out
all the fine points not explicitly mentioned or emphasized in the
books.
You may have even read some of those books and failed to LEARN
the lessons.
Hope this helps.
Greg
I hope it helped others to better understand how you erred, and
continue to err, after 40+ years of doing the WRONG thing!
What a shame, and what a discredit to statistics!
-- Bob.
.
- References:
- multicollinearity in regression
- From: Paul
- Re: multicollinearity in regression
- From: Anon.
- Re: multicollinearity in regression
- From: Greg Heath
- Re: multicollinearity in regression
- From: Reef Fish
- Re: multicollinearity in regression
- From: Greg Heath
- multicollinearity in regression
- Prev by Date: Re: multicollinearity in regression
- Next by Date: Re: multicollinearity in regression
- Previous by thread: Re: multicollinearity in regression
- Next by thread: Re: multicollinearity in regression
- Index(es):
Relevant Pages
|