Re: PCA/LDA what use? interpretation?



On 26-Jan-2008, Corinna <corinnagirl@xxxxxxxxx> wrote:

I also wonder if those methods are appropriate or not, and for what.
The data itself consists of measurements of physical parameters, such
as density, porosity etc. At the moment I'm not trying to accomplish
anything as I don't understand what this is for. I can run this
calculations with a software package but as the manual doesn't say
what it is for (as opposed to all other things), and because these
functions are so dominantley present I wonder what I can do with them.

For a PCA in wikipedia I read "PCA is mostly used as a tool in
exploratory data analysis and for making predictive models"

Principal Component Analysis (PCA) -- which is also known as Factor Analysis
(there are some differences) -- tries to find a set of factors that group
together variables that are linearly correlated (i.e., similar to each other).
If there are correlated variables, there will be fewer factors than variables,
and the correlated variables will be grouped together on a factor that
describes them. For example, if you collect information about people such as
weight, height, body mass index, IQ and education level and do a PCA, it is
likely that PCA will find two factors. One factor will group together weight,
height and body mass which are all related to the person's size, and the other
factor will group together IQ and education which tend to be correlated.

There are a couple of useful things that come out of a PCA:

1. You can determine that you are really measuring a smaller number of
characteristics than the number of variables (because variables are
correlated). This could be useful for understanding a phenomenon. For
example, if you measure a zillion things about people and run a PCA, you may
find out that there are only 5 significant factors. This implies that all of
your measurements are only really measuring 5 dimensions about people. Maybe 5
dimensions are enough to describe people.

2. When there are a large number of variables, you can use PCA to reduce the
dimensionality so that fewer variables have to be fed into a subsequent
analysis. This is known as dimensional reduction. Basically, you use values
from the original variables and the factors found by PCA to compute new factor
scores (variables). This is typically the reason why PCA is used as a part of
predictive modeling. Some types of predictive models can handle a large number
of variables better than others. For example, decision trees work fine with a
large number of variables, so I wouldn't use PCA when building a decision tree.

LDA is one of many types of predictive models -- decision trees, neural
networks, support vector machines, regression and genetic algorithms to mention
a few. With all of these models you feed in a set of predictor (independent)
variables along with a target (dependent) variable whose values you are trying
to predict. The training process attempts to construct a model that predicts
the target values as well as possible from the corresponding predictor values.
Once the model has been trained, you can then feed in a new set of predictor
values and get the corresponding predicted target value out of the model.

LDA is one of the older predictive modeling methods (it was invented by R.A.
Fisher in the 1930's). Given its age, it often does remarkably well compared
with newer methods. LDA can be used only for classification problems where the
target variable has a discrete set of categorical values. It cannot be used
for regression problems where the target variable has continuous values.

I invite you try the demo version of my DTREG program (http://www.dtreg.com).
It has many types of predictive models including LDA, neural networks, decision
trees, SVM and gene expression programming. But it does not have PCA.

--
Phil Sherrod
(PhilSherrod 'at' comcast.net)
http://www.dtreg.com (Decision trees, Neural networks, SVM and Genetic
modeling)
http://www.nlreg.com (Nonlinear Regression)
.



Relevant Pages

  • Re: ratios and spurious correlation
    ... of measurements and ratios derived from those same measurements. ... to decompose "profile" of data. ... I performed CA, PCA,& ICAon the ...
    (sci.stat.math)
  • Re: ratios and spurious correlation
    ... of measurements and ratios derived from those same measurements. ... to decompose "profile" of data. ... I performed CA, PCA,& ICAon the ...
    (sci.stat.math)

Loading