Re: % variance explained - princomp v.s. linear regression



On 2006-09-12 09:07:24 +0100, "joa" <joachim.vandekerckhove@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> said:


"prozuzu" <prozuzu@xxxxxxxxxxx> wrote in message news:4505e3bd$0$559$ed2619ec@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Hi:

I am really confused by the percentage variance explained in the 'princomp' function.

Say I have a 94x8 data matrix and I ran a princple component analysis using princomp
[c, s, variances] = PRINCOMP(data)

The function return 8 pcs, and I want to know the percentage of variance the first two pcs explain. Matlab told me to computer it as:

percent_explained = 100*variances/sum(variances). That's all right for me.

But at the same time, for the same matrix, I also tested it with linear regression, so I assume the data set is the weighted sum of 2 basic components (predefined by me), and the weights for these components are obtained by matlab's 'regress' function. To test how well the regression explain my data, I have the R-square output from the 'regress' function, and it can also be interpreted as the percentage of variance explained.

So my question is: on one side, I know the percentage of variance explained by my frist two pcs using princple component analysis; on the other side, I have the percentage of variance explained by the two components using least-square linear regression. Can I directly compare these two results, and conclude something like the regression explains data better than the PCs. Or I have to do something else to compare the two methods directly. Thanks a lot!


prozuzu

You can compare them, but unless something's gone horribly wrong, and 8-component PCA will have a higher R^2 than a bivariate linear regression. If you want to really compare them in an interesting way, compare information criteria like AIC, Small-sample AIC, BIC, etc...

Thanks a lot joa. I only want to know how well the first 2 pcs do compared with my regression method. So far the regression actually does a better job than the pca. What I am not sure is how matlab computes the percentage variance explained, apparently it is assuming the 8 returned pcs can explain all the variance of the data, and each pcs % variance explained is the proportion between this pcs's variance and the sum of variance for all the 8 pcs.

I actually try to predict the data using the first 2 pcs, so the predicted_data=w1*PC1+w2*PC2, and I then compute the R-square between the predicted_data and the actually data, and the R-square value is different from the percentage variance explained obtained from princomp, that's where most of my confusion comes from. Any insight? Thanks a lot!

prozuzu

.



Relevant Pages

  • Re: % variance explained - princomp v.s. linear regression
    ... 'princomp' function. ... The function return 8 pcs, and I want to know the percentage of variance ... To test how well the regression ... Or I have to do something else to compare the ...
    (comp.soft-sys.matlab)
  • Re: % variance explained - princomp v.s. linear regression
    ... Say I have a 94x8 data matrix and I ran a princple component analysis ... The function return 8 pcs, and I want to know the percentage of variance ... To test how well the regression ... Or I have to do something else to compare the ...
    (comp.soft-sys.matlab)
  • % variance explained - princomp v.s. linear regression
    ... Say I have a 94x8 data matrix and I ran a princple component analysis using princomp ... The function return 8 pcs, and I want to know the percentage of variance the first two pcs explain. ... But at the same time, for the same matrix, I also tested it with linear regression, so I assume the data set is the weighted sum of 2 basic components, and the weights for these components are obtained by matlab's 'regress' function. ... on one side, I know the percentage of variance explained by my frist two pcs using princple component analysis; on the other side, I have the percentage of variance explained by the two components using least-square linear regression. ...
    (comp.soft-sys.matlab)
  • Re: Multicollinearity !!!!!
    ... to solve the linear regression equations. ... The BASIC solution automatically yields n-r zero coefficients ... PCR generally regresses Y on the PCs that have the largest ...
    (sci.stat.math)
  • Re: HP #1, but Macs at 5%
    ... suppliers of PCs to the U.S. market. ... chasing pennies and barely making any money. ... compare a name brand Windows based PC to an *equivalent* Mac today? ...
    (comp.sys.mac.advocacy)

Loading