Re: about PCA and variability??



The various forms of factor analysis can be used for "data reduction", sometimes for finding a few latent constructs underlying more numerous very specific measures. In many fields the interpretability (construing meaning) is important in deciding on the number of factors to retain.
The number of predictors (independent variables) makes a big difference in how many cases (data rows) are needed to do correlations, regressions, etc.

This is over simplified. Suppose your x variables are a set of test questions on spelling, a set on addition and subtraction, etc.The various forms of factor analysis are often used to double check items that "go together". So that instead of having 20 variables each of which measure spelling of particular words, you use a summarization of them to represent general spelling achievement. You might then use the measure of general spelling achievement as a predictor of job success. On the one hand you are finding variables that "are pretty much measuring the same thing" on the other hand you are interested in finding out whether what a construct based on what is common to that set related to a separate construct.

Another example of grouping sets of more particular measures of a construct in order to create a stronger summative measure of that construct is in attitude measurement. M. Lorr et al, took a set of questions thought to measure liberalism-conservatism. They found out that 3 factors could represent the common variance of several dozens of specific questions.
Then in relating liberalism-conservatism to voting or candidate preference etc, they did not have dozens of predictors, they could use 3 which represented the three underlying factors of general liberalism-conservatism, egalitarianism, and favoring sexual freedom. Much subsequent research has found that these 3 factors have different relations to different kinds of social issues.

If a few factors can meaningfully summarize the variance of a larger set of measures, a researcher can do her/his theorizing, modeling, and analysis based on those more abstract constructs. Those constructs can then be used to relate to other constructs.

Art Kendall
Social Research Consultants

onyourmark wrote:
On Jun 27, 3:47 am, xhos...@xxxxxxxxx wrote:
onyourmark <william...@xxxxxxxxx> wrote:

Hi and thanks to all you have responded to my query. My question is, I
suppose, say in regard to the above post, why are we interested in
whether one of the original variables or one of the new derived
variables might capture 98% of the variance of all of the original
variables or not. I mean, I guess this is a very basic question, but
why are we interested in the variation of the variables in the first
place.
Maybe you aren't interested in that. In which case, you probably shouldn't
do aPCAanalysis. It's a tool for a job. If you have no need for that
job, you have no need for that tool.

Xho

--
--------------------http://NewsReader.Com/--------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

Hi and thanks to all again. I am interested in PCA. May I ask, (sorry
for being obtuse), when you say "that is where all the action occurs"
you are saying that all the variability is in those two dimensions and
that only 0.01 percent of all the variability lies in the z axis, but
aren't you concerned with predicting a fourth variable (a response
variable, say Y)? Otherwise I don't understand what you mean by that
is where the action occurs. I mean, I understand that most of the
variation occurs in those two dimensions but why is the variation
important?
I can see that, for example, if data is constant with respect to a
certain variable, say X1, so that for every case/individual X1 has the
same value, say X1=5, across all observations, then X1 will be useless
in predicting Y (or as it is sometimes said "variation in Y") because
X1 is 5 no matter what value Y is (if you tell me that for this
individual/observation/case X1 is 5, that is not going to help me to
predict Y at all). And by extension if X1 is not constant but has
almost no variation then it will be almost useless in predicting the
variation in Y.
So is this why we are interested in the variation of the variables?
Because they are input variables? Or is there some other reason?
Thanks again.
.



Relevant Pages

  • Re: about PCA and variability??
    ... variable might capture 98% of the variance of all of the original ... does this mean this variable is good at predicting ... why are we interested in the variation of the variables in the first ... So I am assuming that in PCA, the reason we are interested in how much ...
    (sci.stat.consult)
  • Re: about PCA and variability??
    ... whether one of the original variables or one of the new derived ... why are we interested in the variation of the variables in the first ... in predicting Y because ... almost no variation then it will be almost useless in predicting the ...
    (sci.stat.consult)
  • Re: r-Squared Question
    ... "Thus SSTO is a measure of uncertainty in predicting Y when X is not considered. ... SSE measures the variation in the Ywhen a regression model using the independent variable X is employed. ...
    (sci.stat.math)

Loading