Re: multicollinearity in regression




Greg Heath wrote:
Reef Fish wrote:
Greg Heath wrote:
Anon. wrote:
-----SNIP
There are people on this list who have a much better understanding of
multicollinearity than I do, so hopefully they'll chime in with some
sensible advice as well.

I always find it helpful to calculate the correlation coefficient
matrix of all variables. This will give you pairwise correlation
information which usually helps to explain most problems with
multicollinearity.

This is patently FALSE, and had been debunked numerous times
in sci.stat.math.

WRONG.

I had posted a reply to this but Google seemed to have lost it.
Here's
an abbreviated version. It may appear as a quasi duplicate if google
recovered the one posted half an hour ago.


Each of those statements is absolutely true. Quantification
of "usually" and "most" does not imply 100% of the time.

You missed the point that you were quantifying the WRONG item
(correlation) when the notion and definition of "linear dependence"
in linear ALGEBRA has no correlation content or mention in it.

"Linear dependence" is an notion in LINEAR
ALBEBRA, whose definition does NOT depend on any notion of
"correlations".

Curious reply since I made no such implication to the contrary.
Members of a subset of variables are linearly dependent if a
nontrivial linear combination of them is always zero.

Or constant. So why mention correlation and said it was useful?


My point is that, in my 40+ years of data analysis and
statistical modelling, I have found that

Making errors for 40+ years won't make it right! Better LATE
than NEVER to learn where your errors were!



1. Most (say > 50% of the time) of my multicollinearity
problems could be mitigated by removing only 1 or 2 dependent
variables.

You meant independent variable that are "linearly dependent"
don't you?


2. Perusing the correlation coefficient matrix before modelling
usually (say > 50% of the time) indicated which variables warranted
further investigation.

You would be barking at the wrong tree MOST of the time, including
MISSING the trees when those variables all have LOW correlations,
with each other and with other variables, though perfectly "linearly
dependent" to the point of BLOWING up the regression (for reasons
of a singular X'X matrix).

In that respect, correlations are completely
USELESS (except the case r = 1.000000) in diagnosing
multicollinearity problems.

WRONG. "completely useless" implies 100% of the time.

Would 99.99999% of the time useless make you happy?


Additional insight, if needed, can be obtained
from pairwise scatter plots. For example, if x2, x4 and x6 are
significantly correlated it sometimes helps to plot x4 and x6
vs x2.

You would only be waiting the time and resources of pairwise
scatter plots.

I use MATLAB in the interpretive mode. How much time and
resources does it take to type in the command

plot(x(:,2),x(:,4),'b.',x(:,2),x(:,6),'r.')

and then press the reurn key?

The time of typing those lines; the wasted computer time; and
wasted paper in printing your scatter matrix and plots. NONE of
them is indicative of "linear dependence", so why bother?


Eigenvalue and eigenvector analysis of the X's is the only way
to sort out and understand the underlying multicollinerity.

WRONG. "only way" implies 100% of the time.

It is 100% of the time here, including those X's that have r = 1.00000.


It's all DEJA VU.

Use the google archives and keywords to find what you missed.
in sci.stat.math, since March 2005.

Yes. There is very good stuff there. However, most of what he
missed was senseless arguing over misinterpretations and imprecise
inferences... not recommended for an introduction to the topic. Better
to recommend a good introductory text.

I could recommended the books I've used to teach the subject, but it
would be lacking the competent INSTRUCTOR (myself) to point out
all the fine points not explicitly mentioned or emphasized in the
books.

You may have even read some of those books and failed to LEARN
the lessons.

Hope this helps.

Greg

I hope it helped others to better understand how you erred, and
continue to err, after 40+ years of doing the WRONG thing!

What a shame, and what a discredit to statistics!

-- Bob.

.



Relevant Pages

  • Re: multicollinearity in regression
    ... multicollinearity than I do, so hopefully they'll chime in with some ... I always find it helpful to calculate the correlation coefficient ... variables causing the linear dependence. ... All the textbooks I've used were GOOD introductory texts, ...
    (sci.stat.consult)
  • Re: multicollinearity in regression
    ... multicollinearity than I do, so hopefully they'll chime in with some ... I always find it helpful to calculate the correlation coefficient ... This will give you pairwise correlation ... from pairwise scatter plots. ...
    (sci.stat.consult)
  • Re: multicollinearity in regression
    ... I could use Analysis of Covariance but 2 of the independent variables ... I'm guessing that in the model with LOGSIZE, the LOGSIZE coefficient is ... multicollinearity: it may be that you can then see a sensible approach ... I always find it helpful to calculate the correlation coefficient ...
    (sci.stat.consult)
  • Re: multicollinearity in regression
    ... give you MY explanation on multicollinearity. ... X'X is the same as covariance or correlation matrix of X ... The above is partly true (when X'X is nearly singular), ... using PCA (principal component analysis), ...
    (sci.stat.consult)
  • Re: White House spins "The Commander Guy"
    ... Alan Baker wrote: ... Note in particular, these scatter plots: ... both higher *and* lower than the IQ that a single table entry would ... the Raven's score correlation regression line ...
    (rec.sport.golf)