Re: Principal Component Analysis



Hi David,

Some suggestions:

1. If your categorical variables are ordered-categorical, then you
can calcualte:
a. Pearson correlations between each pair of continuous variables.
b. Polyserial correlations between continuous and ordered-
categorical variables
c. Polychoric correlations between ordered-categorical variables

then place these in a single matrix and analyze that matrix by PCA. A
program like LISREL/Prelis will do all this for you more-or-less
automatically.

2. Although I agree with what others have posted, personally I prefer
the approach you originally suggested: to approach data-reduction and
the modeling of your response variable as two separate steps.

3. Since you just want to select a subset of non-redundant variables,
you have other options besides PCA. For example, you can use
hierarchical cluster analyis on the correlation matrix. That will
divide your variables into clusters. Then you can pick 'exemplars'
from each cluster and use those in your data model. This gives you
more flexibility, because you can use other measures of similarity/
redundancy among your variables besides correlation coefficients. For
example, if your categorical variables are non-ordered (i.e., purely
nominal variables), you can calculate the canonical correlation
between each pair of them. Then you can cluster analyze the matrix of
canonical correlation coefficients to divide the variables into
separate groups, and then select exemplars from each group.

Possibly you can include the canonical correlations in the overall
matrix as described in point 1 above -- I'm not sure, becuase they
might tend to run lower overall than Pearson correlations.

Hope this helps.

John Uebersax PhD

On May 8, 6:21 pm, David <david_art...@xxxxxxxxxxx> wrote:
Dear list,
is it possible to use PCA on categorical data?
I have a group of 30 continuous and categorical data and would like to
select a subset for modeling a response variable. I read that PCA
would help me doing this data reduction, but all the examples I have
seen involve continuous data.

Thanks for your help,
D.

.



Relevant Pages

  • Re: Principal Component Analysis
    ... categorical variables are generally non-ordered, ... What kind of correlation measurement could I use for non- ... correlations and polychoric correlations between two binary variables ...
    (sci.stat.consult)
  • Re: Difference between Principal Components Analysis and Factor Analysis?
    ... What is the difference between PCA and Factor Analysis? ... PCA analyzes a matrix with 1.0s on the main diagonal, ... I check to see whether the correlations are ... or it may provide evidence of essential ...
    (sci.stat.math)