Re: % variance explained - princomp v.s. linear regression
- From: prozuzu <prozuzu@xxxxxxxxxxx>
- Date: Tue, 12 Sep 2006 09:45:25 +0100
On 2006-09-12 09:07:24 +0100, "joa" <joachim.vandekerckhove@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> said:
"prozuzu" <prozuzu@xxxxxxxxxxx> wrote in message news:4505e3bd$0$559$ed2619ec@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxHi:
I am really confused by the percentage variance explained in the 'princomp' function.
Say I have a 94x8 data matrix and I ran a princple component analysis using princomp
[c, s, variances] = PRINCOMP(data)
The function return 8 pcs, and I want to know the percentage of variance the first two pcs explain. Matlab told me to computer it as:
percent_explained = 100*variances/sum(variances). That's all right for me.
But at the same time, for the same matrix, I also tested it with linear regression, so I assume the data set is the weighted sum of 2 basic components (predefined by me), and the weights for these components are obtained by matlab's 'regress' function. To test how well the regression explain my data, I have the R-square output from the 'regress' function, and it can also be interpreted as the percentage of variance explained.
So my question is: on one side, I know the percentage of variance explained by my frist two pcs using princple component analysis; on the other side, I have the percentage of variance explained by the two components using least-square linear regression. Can I directly compare these two results, and conclude something like the regression explains data better than the PCs. Or I have to do something else to compare the two methods directly. Thanks a lot!
prozuzu
You can compare them, but unless something's gone horribly wrong, and 8-component PCA will have a higher R^2 than a bivariate linear regression. If you want to really compare them in an interesting way, compare information criteria like AIC, Small-sample AIC, BIC, etc...
Thanks a lot joa. I only want to know how well the first 2 pcs do compared with my regression method. So far the regression actually does a better job than the pca. What I am not sure is how matlab computes the percentage variance explained, apparently it is assuming the 8 returned pcs can explain all the variance of the data, and each pcs % variance explained is the proportion between this pcs's variance and the sum of variance for all the 8 pcs.
I actually try to predict the data using the first 2 pcs, so the predicted_data=w1*PC1+w2*PC2, and I then compute the R-square between the predicted_data and the actually data, and the R-square value is different from the percentage variance explained obtained from princomp, that's where most of my confusion comes from. Any insight? Thanks a lot!
prozuzu
.
- Follow-Ups:
- References:
- Prev by Date: Re: Interesting Matrix Manupulation
- Next by Date: Re: Interesting Matrix Manupulation
- Previous by thread: Re: % variance explained - princomp v.s. linear regression
- Next by thread: Re: % variance explained - princomp v.s. linear regression
- Index(es):
Relevant Pages
|
Loading