kmeans and PCA: cluster number?
- From: Rob Campbell <news@xxxxxxxxxxxxxxxxxxxxxxxxxx>
- Date: Tue, 06 Sep 2005 09:35:29 GMT
Hi,
I have data represented as points in a 5-D principle component space. I
could model the data using 3 to 7 components (out of a maximum of 100) but
5 appears to be the optimum. I'm doing a k-means analysis on the data. I'm
asking kmeans for 5 classes (coincidently also the number of dimensions in
my PC space) but might want to play with a different number of clusters.
Asking for more cluster than you have dimensions isn't a problem, is it?
Logically I don't see why asking for 10 clusters in 2-D space should be a
problem. I ask because I read somewhere that this was an issue but I
couldn't see why this would be the case.
Is there anything else I need to be wary of?
Cheers,
Rob
MORE DETAILS IF NEEDED/INTERESTED:
I have 307 data points which I am representing in principle component space.
Following a randomisation test which estimates the signal to noise ratio, I
have settled on using the first 5 components to model the data. Any number
between 3 and 7 might be reasonable. Visual inspection shows that the data
points do not fall into discrete clusters in PC space. There is variation
in density along the data point cloud, but it is quite clearly a single
cloud.
Previous work has used different (no PCA) and subjective measures to divide
data like mine into separate classes. My PCA shows this distinction is
artificial since the data themselves do not suggest the existence of
discrete clusters. Nevertheless, I want to relate what I have done to
previous work so I'm doing kmeans clustering on my data in PC space. I want
to to see if this alogrithm picks out "clusters" with properties similar to
those suggested by other researchers.
I have asked kmeans for 5 clusters as the study closest to mine has visually
partitioned the data into this many classes. The results of doing this are
roughly what I expected/
--
remove FERRET for reply
www.robertcampbell.co.uk
.
- Follow-Ups:
- Re: kmeans and PCA: cluster number?
- From: Art Kendall
- Re: kmeans and PCA: cluster number?
- Prev by Date: Simple statistics software for a nonstatistician
- Next by Date: Re: Looking for authors for chapters on "best practices" in quantitative methodology
- Previous by thread: Simple statistics software for a nonstatistician
- Next by thread: Re: kmeans and PCA: cluster number?
- Index(es):
Relevant Pages
|