Re: Principal Component Analysis



Paige Miller wrote:
When you have 30 input variables, and they are (highly) correlated,
the problem isn't the method -- in this case PLS -- the problem is
that you cannot in any way separate the distinct and independent
effects of each of the individual input variables. Logically, this
cannot be done.

Paige wrote a very nice summarization on the difficulty (or
impossiblity) of interpreting the regression coefficients in which
predictors are correlated and I am fully in agreement with his
recommendation of PLS.

I just want to emphasize one more time. If variables are highly
correlated, the distinct and independent effects of individual
predictors can NOT be estimated NO MATTER WHAT (even with zillions of
data, any sophisticated non-linear methods, etc).

Logically impossible and mathematically impossible as well. This is
related to collinearity or ill-conditioning.

I found the chapters 12 and 13 of “Data Analysis and Regression”
written by F. Mosteller and J. W. Tukey (Woes of regression
coefficients) very enlightening for the interpretation of regression
coefficients.

Interpretation assumes cause-and-effect relationship. In my humble
opinion, statistics has not been successful in understanding the cause-
and-effect relationship, let alone, dynamic behavior. It has been
repeated so many times: Correlation is not causation. Well, this
is not the responsibility of statistics but the problem of
OBSERVATIONAL data. What I mean is that if data are from controlled
experiments in which the predictors are independent, samples are
balanced and randomly allocated, the interpretation and the effects
of individual predictors are very straight forward. However, majority
of data are observational in life. So the difficulty of
interpretation.

By the way, in a rough summary.
- Multiple linear regression (also ALL the methods in the GENERALIZED
LINEAR REGRESSION)accounts for the maximum variance of Y.
- PCA (and thus PCR) accounts for the maximum variance of X.
- PLS accounts for the maximum variance of X AND Y.

I found it helpful to just think about the intrinsic structure among
predictors before any methods are applied: whether some predictors are
causally related and thus combined or deleted.

Hope this helps.

Sangdon Lee, Ph.D.
GM Tech Center.

.



Relevant Pages

  • Re: transformation of regressors to remove collinearity
    ... to be less than 3 and correlation coeffcient should be less than 0.7. ... with very few predictors. ... mathmatical transformation of predictors and they provide ... As you already know, prediction and interpretation ...
    (sci.stat.math)
  • Re: Principal Component Analysis
    ... That set can be all the raw variables or the set of summative scales based of the factor analysis. ... PCA and "Correlation among predictors". ... prediction with no fear of loosing information, ...
    (sci.stat.consult)
  • Re: Questions about square errors
    ... Take a look at the 10X10 correlation coefficient matrix and the ... multicollinearities. ... least squares and/or multiple regression. ... Your model may have several unnecessary predictors. ...
    (sci.stat.math)
  • Re: transformation of regressors to remove collinearity
    ... Then the correlation between those ... projections we can get a regression equation without collinearity. ... It seems a *little* bit fruitful if all the useful Predictors ...
    (sci.stat.math)
  • Re: transformation of regressors to remove collinearity
    ... angle between the two predictors. ... Then the correlation between those ... projections we can get a regression equation without collinearity. ... Fit a regression of each pair of predictor variables with each other. ...
    (sci.stat.math)