Re: Principal Component Analysis



Of course, a lot depends on the nature of the data, and the substantive field you are working in.

CATPCA will allow you to find groups of variables that are "measuring somewhat the same thing". There are parts of any factor analysis the require subject matter knowledge and judgment. You need to balance the number of factors to retain and the meaningfulness of the construct underlying the factors. The factors are usually interpreted as a more abstract construct than the constructs for the individual items. A scale is often created considering mostly the common variance of the items. Therefore, the unique variance is "lost". Think of a set of achievement items. A researcher is more often interested in "spelling" than in whether a student can spell "cat". Were your variables originally intended to be items in scales?

CATREG will allow you to see how well a set of variables (categorical and/or continuous) fit/predict with a dependent variable (categorical or continuous). That set can be all the raw variables or the set of summative scales based of the factor analysis.

One way to see how much of the variance of the dependent variable is NOT "accounted for" by using scales derived from a factor analysis is try to to enter all of the 30 items letting the s/w drop items that are found to be perfectly collinear. Then compare that fit with that from entering the scores from the factor analysis. It is then a judgment call whether the "simpler""more abstract" model using the scales is more useful than the model using the "more concrete" items.


Art Kendall
Social Research Consultants

David wrote:
Hi all again, thanks for your input, it is greatly appreciated. You
may be guessing by now my statistics level. I am trying to get some
directions as to where to look for from here.
Following your discussion, for what I am understanding, it seems to be
a general consensus that feature selection needs to be addressed
together with the dependent variable, so that information that might
be useful in predicting Y is not dropped. And PLS does this, unlike
PCA and "Correlation among predictors". But on the other hand, I
wonder why finding groups of correlated predictors and then choosing a
member of each group to relate it to the response variable would lead
to loosing information? If I had 5 groups of correlated variables
among my 30 predictors, I could use the 5 representatives in the
prediction with no fear of loosing information, couldn´t I?

One thing that strikes me is that you are all talking about "(highly)
correlated" predictors. Is this generally the case for any dataset?
Paige, you say that "When you have 30 input variables, and they are
(highly) correlated,
the problem isn't the method -- in this case PLS -- the problem is
that you cannot in any way separate the distinct and independent
effects of each of the individual input variables."
Do you always get highly correlated variables? I am working on
clinical data, having different "biological" and "social" variables
(such as presence of symptoms, smoker, concentration of
metabolites,...) to try to predict a clinical outcome (response to
treatment, bad prognosis,...). I can understand that if all 30
variables are correlated, no method will allow me to do a good
selection of variables. If this is always the case, then why bother
trying to model data?

Thanks Art for the info on PCA. We use SPSS and will look into the
information. I am also trying R.

Regards,
David
.



Relevant Pages

  • Re: Principal Component Analysis
    ... PCA and "Correlation among predictors". ... prediction with no fear of loosing information, ... Do you always get highly correlated variables? ...
    (sci.stat.consult)
  • Re: Questions about square errors
    ... Take a look at the 10X10 correlation coefficient matrix and the ... multicollinearities. ... least squares and/or multiple regression. ... Your model may have several unnecessary predictors. ...
    (sci.stat.math)
  • Re: Principal Component Analysis
    ... When you have 30 input variables, and they are correlated, ... predictors are correlated and I am fully in agreement with his ... Interpretation assumes cause-and-effect relationship. ... repeated so many times: Correlation is not causation. ...
    (sci.stat.consult)
  • Re: transformation of regressors to remove collinearity
    ... Then the correlation between those ... projections we can get a regression equation without collinearity. ... It seems a *little* bit fruitful if all the useful Predictors ...
    (sci.stat.math)
  • Re: transformation of regressors to remove collinearity
    ... angle between the two predictors. ... Then the correlation between those ... projections we can get a regression equation without collinearity. ... Fit a regression of each pair of predictor variables with each other. ...
    (sci.stat.math)