Re: The magic of PCA in machine learning



Good questions.

1. Number of parameters is a rule of thumb, but not all parameters
are created equal. In a worth-knowing but only somewhat-related
example, I can store arbitrary amounts of information in a single
real number if I can use an arbitrary precision representation.
Using such an approach, I can create one-parameter spaces of
functions of with infinite VC-dimension. Usually, if the
parameters are of the same type (and I am not providing a precise
definition here), then having fewer parameters would lead to a
less complex space, and you might expect better generalization.
However, trying to compare eigenvector parameters to
dimensionality of a feature space is an apples-to-oranges
comparison; you can't really say anything.

2. Different algorithms have different sensitivities to irrelevant or
very slightly relevant dimensions. My experience is that a
well-tuned linear SVM (or regularized least classifier), where
well-tuned means you were careful about selecting the
regularization parameter by some form of cross-validation, is
highly resistant to minimally relevant or irrelevant dimensions.
In my experience, doing a linear dimensionality reduction (PCA)
will not improve accuracy of a well-tuned SVM-style algorithm. I
am not making any statements about various forms of nonlinear
dimensionality reductions.

Hope these thoughts help.

Cheers,

rif

[ comp.ai is moderated ... your article may take a while to appear. ]
.