Re: PCA without Mean Centering



Amac Herdagdelen wrote:

If I center the feature vectors around 0 by subtracting the mean values, I get a full matrix (actually I can not even store the resulting matrix in the memory) and I am unable to further process the data. The data values in the matrix are binary. Because the matrix is sparse, most of the variables (i.e. features or columns) have a mean close to 0 with a few exceptional cases which could be 1 at the maximum.

Amac, I'll let others comment on whether or not trying to do PCA on a binary adjacency matrix s a sensible thing to do. But your question about mean centering is worth worrying about. I don't have a real answer to your question, just an observation.

In the usual context of PCA, you want to mean center your data, because otherwise the first component does not really describe the largest direction of variation in the data, but rather it tends to describe the mean of the data, or at least some combination of the mean and the direction of largest variation. That may be what's desired in some cases, but it's not the usual thing to do with PCA. Usually in PCA, one is interested in the directions of variation _about the mean_.

Imagine a cloud of points in 3-space, and imagine a vector pointing from the origin to the center of the cloud. Is the cloud's center near to the origin, with respect to the size of the cloud? Or far away? That sort of thing is what will determine how not mean centering will affect the results of the PCA.

Hope this helps.
.



Relevant Pages

  • Re: eigenvalues of the covarience matrix (princomp)
    ... ZSCORE centers each column to have zero mean, and normalizes each column to have unit variance. ... There's limited use in doing PCA on non-centered data, because the first component will typically describe the mean of the data, and that's not what most people want out of PCA. ... My own opinion is that doing PCA on unstandardized variables implies that you think that the scales on which the different variables are measured are somehow "natural" and "comparable", in the sense that variation of some absolute magnitude in one variable is no more or less important than the same amount of absolute variation in another variable. ...
    (comp.soft-sys.matlab)
  • Re: a principal component analysis question
    ... >variables, x- x, I do a principle component analysis and find ... >are enough to explain most of its total variation. ... >express the data with two 2-dimension planes, ... Are the 1st and 2nd groups defined before the PCA was run? ...
    (sci.stat.math)
  • Re: a principal component analysis question
    ... Yiyu wrote: ... the 2 groups can not be identified before the PCA was run. ... component explains little variation. ... Prev by Date: ...
    (sci.stat.math)