Re: Principal Component Analysis
- From: sangdonlee@xxxxxxxxx
- Date: Thu, 15 May 2008 06:46:59 -0700 (PDT)
Paige Miller wrote:
When you have 30 input variables, and they are (highly) correlated,
the problem isn't the method -- in this case PLS -- the problem is
that you cannot in any way separate the distinct and independent
effects of each of the individual input variables. Logically, this
cannot be done.
Paige wrote a very nice summarization on the difficulty (or
impossiblity) of interpreting the regression coefficients in which
predictors are correlated and I am fully in agreement with his
recommendation of PLS.
I just want to emphasize one more time. If variables are highly
correlated, the distinct and independent effects of individual
predictors can NOT be estimated NO MATTER WHAT (even with zillions of
data, any sophisticated non-linear methods, etc).
Logically impossible and mathematically impossible as well. This is
related to collinearity or ill-conditioning.
I found the chapters 12 and 13 of “Data Analysis and Regression”
written by F. Mosteller and J. W. Tukey (Woes of regression
coefficients) very enlightening for the interpretation of regression
coefficients.
Interpretation assumes cause-and-effect relationship. In my humble
opinion, statistics has not been successful in understanding the cause-
and-effect relationship, let alone, dynamic behavior. It has been
repeated so many times: Correlation is not causation. Well, this
is not the responsibility of statistics but the problem of
OBSERVATIONAL data. What I mean is that if data are from controlled
experiments in which the predictors are independent, samples are
balanced and randomly allocated, the interpretation and the effects
of individual predictors are very straight forward. However, majority
of data are observational in life. So the difficulty of
interpretation.
By the way, in a rough summary.
- Multiple linear regression (also ALL the methods in the GENERALIZED
LINEAR REGRESSION)accounts for the maximum variance of Y.
- PCA (and thus PCR) accounts for the maximum variance of X.
- PLS accounts for the maximum variance of X AND Y.
I found it helpful to just think about the intrinsic structure among
predictors before any methods are applied: whether some predictors are
causally related and thus combined or deleted.
Hope this helps.
Sangdon Lee, Ph.D.
GM Tech Center.
.
- References:
- Principal Component Analysis
- From: David
- Re: Principal Component Analysis
- From: John Uebersax
- Re: Principal Component Analysis
- From: David
- Re: Principal Component Analysis
- From: Paige Miller
- Principal Component Analysis
- Prev by Date: Re: More (me vs.) multiple regression
- Next by Date: Re: multiple linear regression
- Previous by thread: Re: Principal Component Analysis
- Next by thread: Re: Principal Component Analysis
- Index(es):
Relevant Pages
|