Re: Principal Component Analysis



On May 15, 8:32 am, John Uebersax <jsueber...@xxxxxxxxx> wrote:
Hi David,

Some suggestions:

1.  If your categorical variables are ordered-categorical, then you
can calcualte:
    a. Pearson correlations between each pair of continuous variables.
    b. Polyserial correlations between continuous and ordered-
categorical variables
    c. Polychoric correlations between ordered-categorical variables

then place these in a single matrix and analyze that matrix by PCA.  A
program like LISREL/Prelis will do all this for you more-or-less
automatically.

2.  Although I agree with what others have posted, personally I prefer
the approach you originally suggested:  to approach data-reduction and
the modeling of your response variable as two separate steps.

3. Since you just want to select a subset of non-redundant variables,
you have other options besides PCA.  For example, you can use
hierarchical cluster analyis on the correlation matrix.  That will
divide your variables into clusters.  Then you can pick 'exemplars'
from each cluster and use those in your data model.  This gives you
more flexibility, because you can use other measures of similarity/
redundancy among your variables besides correlation coefficients.  For
example, if your categorical variables are non-ordered (i.e., purely
nominal variables), you can calculate the canonical correlation
between each pair of them.  Then you can cluster analyze the matrix of
canonical correlation coefficients to divide the variables into
separate groups, and then select exemplars from each group.

Possibly you can include the canonical correlations in the overall
matrix as described in point 1 above -- I'm not sure, becuase they
might tend to run lower overall than Pearson correlations.

Hope this helps.

John Uebersax PhD

On May 8, 6:21 pm, David <david_art...@xxxxxxxxxxx> wrote:



Dear list,
is it possible to use PCA on categorical data?
I have a group of 30 continuous and categorical data and would like to
select a subset for modeling a response variable. I read that PCA
would help me doing this data reduction, but all the examples I have
seen involve continuous data.

Thanks for your help,
D.- Hide quoted text -

- Show quoted text -

Thankyou all for your input. Here are some comments:

- Art, you suggest some PCA methods, but my initial worry about using
PCA is losing iterpreatability

- Paige, you suggest PLS, but is PLS not doing effectively what PCA
does or Principal Component Regression? I have just read through it
quickly, and had a look at Faraway`s "Practical Regression and ANOVA
using R" and it says "On the other hand, PLS is virtually useless for
explanation purposes". So how can I trace back my regressors after
doing PLS?

- John, you suggest calculating a correlation matrix for all pairwise
comparison of my variables and then performing hierarchical clustering
to select a representative of each of the groups. That sounds very
interesting. So if I have 30 variables, should I end up with a 30x30
correlation matrix that could be fed to a clustering algorithm? My
categorical variables are generally non-ordered, like "family history"
yes-no. What kind of correlation measurement could I use for non-
ordered categorical variables?

Thanks for your useful comments

D.
.



Relevant Pages

  • Re: Principal Component Analysis
    ... If you just have numerical and binary data, Pearson correlation will not be ... If your categorical variables are ordered-categorical, ... then place these in a single matrix and analyze that matrix by PCA. ... comparison of my variables and then performing hierarchical clustering ...
    (sci.stat.consult)
  • Re: correlation _ Pearson
    ... Turning categorical variables into numerical variables doesn't change the fact that they are categorical. ... I am not sure what purpose it would serve to compute a Pearson correlation on a categorical variable with a continuous variable. ... I hope you dance -- Lee Ann Womack ...
    (sci.stat.edu)
  • Re: Principal Component Analysis- Do I need to scale (i.e. normalize) my variables?
    ... This is a correlation matrix on 4 z-scores. ... I need to perform PCA on 20 or so variables (ex. ... mention that standardization undoes any previous linear but not non- ... The larger the ratio of the ...
    (sci.stat.math)
  • Re: Principal Component Analysis- Do I need to scale (i.e. normalize) my variables?
    ... This is a correlation matrix on 4 z-scores. ... adding them all to my PCA. ... I'm using SPSS so I'd ... mention that standardization undoes any previous linear but not non- ...
    (sci.stat.math)
  • Re: Axis Tilting?
    ... time with geographical separation at 15 degrees per hour and 4 minutes ... original reference for daily rotation and from that point of departure ... separate the correlation which links time with geographical separation ...
    (sci.astro.amateur)