Re: Principal Component Analysis???



PCA trasnsforms a set of variables into anothe set of variables (same
number), in which the original data points have been rotated or sheared
(I'll spare you the details about centering and scaling). After this
process, the first prinicpal component contains the maximum possible
variance. Further, the second prinicpal component contains the maximum
possible variance, subject to the constraint that it not be correlated
with the first prinicpal component, and so on for the third prinicpal
component, etc. In effect, this concentrates information (variance,
really).

In some cases, nearly all of the total variance is contained in the
first (or first few) prinicpal components (high "overlap" among
original variables = low number of "effective" dimensions: powerful
concentration of information via PCA), and in some cases the PCs all
have nearly the same variance (low "overlap" among original variables =
large number of "effective" dimensions: poor concentration via PCA).

So, you've successfully performed PCA on your original m variables
and now have a shiny, new set of m principal components. What do you
do with them?

Some people work directly with the principal components, for example,
in scatterplots. Another possibility is modeling (regression) using
the principal components as predictor variables instead of the
originals. The assumption (and it's a big one) is that variance equals
predictive power and models with less predictors will have the
accuracy. Yet a third possibility is to interpret the coefficients of
the prinicpal components.

People also "zero out" the coefficients of the last so many principal
components (those with the least variance) and invert the process,
yielding a filtered or smoothed version of the original data. This is
sometimes known as "subspace projection".

For better understanding and maybe some ideas for applying PCA, I
recommend getting a decent book on multivariate statistics. Try, for
instance:

"Multivariate Statistical Methods: A Primer", by Manly


-Will Dwinnell
http://will.dwinnell.com


miki lo wrote:
hello,
I want to perform the PCA on some data I have.
I wrote the command as suggested in the matlab help:

[pcs,newdata,variances,t2] = princomp(PTG_Parameter_full);

but what do I do with this data now? how do I get back to my original data a
nd reduce the variables I don't need???
I will just say that I have 5 factors in my data, and that the results I got
for the PCA in terms of variance are:

variances =

175.9826
15.8123
13.4824
9.0129
4.8680

Which from what I understand means that the first Principak Component is the most important one.
So, now I wanna leave just the first 3 Principal Components and return to my Original data, how do I do this?
Thank you

.



Relevant Pages

  • Re: Better Technique for Dimensional Reduction (PCA) with Class Labels?
    ... on the leading principal components will make things worse rather than ... the variance is ... PCA will choose the directions of maximum spread ... training data principal components to try to best separate the disease ...
    (sci.stat.math)
  • Re: Question about factor analysis / PCA
    ... I have some data that I've been analysing using Principal Components ... the actual dimensions of my original data (i.e. before I do PCA) ... The total variance of a set of data in PCA is contained ... to use the data in order to estimate the communality: ...
    (sci.stat.math)
  • Re: Principal Component Analysis???
    ... That's a very difficult question to answer since you didn't tell us why you are doing the principal components analysis (PCA) in the first place. ... Your original data is your original data. ... I hope you dance -- Lee Ann Womack ...
    (comp.soft-sys.matlab)
  • PCA for data reduction
    ... not typically for word lengths), ... H. G. Wells. ... I can see the use of PCA when a two dimensional plot is needed so ... This is in exactly the same format as the original data. ...
    (sci.stat.math)
  • Fix of previously sent question on PCA
    ... (though not typically for word lengths), ... clearly by Wells. ... I can see the use of PCA when a two dimensional plot is needed so ... This is in exactly the same format as the original data. ...
    (sci.stat.math)

Loading