Re: Principal Component Analysis
- From: Art Kendall <Arthur.Kendall@xxxxxxxxxxx>
- Date: Fri, 16 May 2008 15:19:19 GMT
Of course, a lot depends on the nature of the data, and the substantive field you are working in.
CATPCA will allow you to find groups of variables that are "measuring somewhat the same thing". There are parts of any factor analysis the require subject matter knowledge and judgment. You need to balance the number of factors to retain and the meaningfulness of the construct underlying the factors. The factors are usually interpreted as a more abstract construct than the constructs for the individual items. A scale is often created considering mostly the common variance of the items. Therefore, the unique variance is "lost". Think of a set of achievement items. A researcher is more often interested in "spelling" than in whether a student can spell "cat". Were your variables originally intended to be items in scales?
CATREG will allow you to see how well a set of variables (categorical and/or continuous) fit/predict with a dependent variable (categorical or continuous). That set can be all the raw variables or the set of summative scales based of the factor analysis.
One way to see how much of the variance of the dependent variable is NOT "accounted for" by using scales derived from a factor analysis is try to to enter all of the 30 items letting the s/w drop items that are found to be perfectly collinear. Then compare that fit with that from entering the scores from the factor analysis. It is then a judgment call whether the "simpler""more abstract" model using the scales is more useful than the model using the "more concrete" items.
Art Kendall
Social Research Consultants
David wrote:
Hi all again, thanks for your input, it is greatly appreciated. You.
may be guessing by now my statistics level. I am trying to get some
directions as to where to look for from here.
Following your discussion, for what I am understanding, it seems to be
a general consensus that feature selection needs to be addressed
together with the dependent variable, so that information that might
be useful in predicting Y is not dropped. And PLS does this, unlike
PCA and "Correlation among predictors". But on the other hand, I
wonder why finding groups of correlated predictors and then choosing a
member of each group to relate it to the response variable would lead
to loosing information? If I had 5 groups of correlated variables
among my 30 predictors, I could use the 5 representatives in the
prediction with no fear of loosing information, couldn´t I?
One thing that strikes me is that you are all talking about "(highly)
correlated" predictors. Is this generally the case for any dataset?
Paige, you say that "When you have 30 input variables, and they are
(highly) correlated,
the problem isn't the method -- in this case PLS -- the problem is
that you cannot in any way separate the distinct and independent
effects of each of the individual input variables."
Do you always get highly correlated variables? I am working on
clinical data, having different "biological" and "social" variables
(such as presence of symptoms, smoker, concentration of
metabolites,...) to try to predict a clinical outcome (response to
treatment, bad prognosis,...). I can understand that if all 30
variables are correlated, no method will allow me to do a good
selection of variables. If this is always the case, then why bother
trying to model data?
Thanks Art for the info on PCA. We use SPSS and will look into the
information. I am also trying R.
Regards,
David
- References:
- Principal Component Analysis
- From: David
- Re: Principal Component Analysis
- From: John Uebersax
- Re: Principal Component Analysis
- From: David
- Re: Principal Component Analysis
- From: Art Kendall
- Re: Principal Component Analysis
- From: David
- Principal Component Analysis
- Prev by Date: Re: Principal Component Analysis
- Next by Date: Re: Collinearity, confidence intervals and sampling
- Previous by thread: Re: Principal Component Analysis
- Next by thread: Re: Principal Component Analysis
- Index(es):
Relevant Pages
|