kmeans and PCA: cluster number?



Hi,

I have data represented as points in a 5-D principle component space. I
could model the data using 3 to 7 components (out of a maximum of 100) but
5 appears to be the optimum. I'm doing a k-means analysis on the data. I'm
asking kmeans for 5 classes (coincidently also the number of dimensions in
my PC space) but might want to play with a different number of clusters.

Asking for more cluster than you have dimensions isn't a problem, is it?
Logically I don't see why asking for 10 clusters in 2-D space should be a
problem. I ask because I read somewhere that this was an issue but I
couldn't see why this would be the case.

Is there anything else I need to be wary of?

Cheers,
Rob




MORE DETAILS IF NEEDED/INTERESTED:

I have 307 data points which I am representing in principle component space.
Following a randomisation test which estimates the signal to noise ratio, I
have settled on using the first 5 components to model the data. Any number
between 3 and 7 might be reasonable. Visual inspection shows that the data
points do not fall into discrete clusters in PC space. There is variation
in density along the data point cloud, but it is quite clearly a single
cloud.

Previous work has used different (no PCA) and subjective measures to divide
data like mine into separate classes. My PCA shows this distinction is
artificial since the data themselves do not suggest the existence of
discrete clusters. Nevertheless, I want to relate what I have done to
previous work so I'm doing kmeans clustering on my data in PC space. I want
to to see if this alogrithm picks out "clusters" with properties similar to
those suggested by other researchers.

I have asked kmeans for 5 clusters as the study closest to mine has visually
partitioned the data into this many classes. The results of doing this are
roughly what I expected/

--
remove FERRET for reply
www.robertcampbell.co.uk
.



Relevant Pages

  • Re: kmeans and PCA: cluster number?
    ... If you only have k-means you need to do several runs with sorts of the cases for each number of clusters you are considering. ... I have 307 data points which I am representing in principle component space. ... previous work so I'm doing kmeans clustering on my data in PC space. ...
    (sci.stat.consult)
  • Re: Kmeans clustering
    ... "K-means clustering can best be described as a partitioning method. ... the function kmeans partitions the observations in your data into K mutually ... the k clusters it has assigned each observation." ... kmeans returns an n-by-1 vector IDX containing the cluster ...
    (comp.soft-sys.matlab)
  • Re: Clustering -- Urgent --
    ... I am trying to construct clusters ... with the restriction of maximumclustersize (for example ... provision in KMEANS for doing anything like that. ... you could use KMEANS in some sort of hierarchical fashion to split all ...
    (comp.soft-sys.matlab)
  • Re: Clustering -- Urgent --
    ... with the restriction of maximum cluster size (for example ... It's conceivable that you could use KMEANS in some sort of hierarchical fashion to split all your data into two "clusters", and then work on each piece recursively until you get a partition of your data in which each subset is small enough. ...
    (comp.soft-sys.matlab)
  • Re: a principal component analysis question
    ... If groups (or clusters) cannot be identified before PCA, ... you can build a PCA model for each group. ... Prev by Date: ...
    (sci.stat.math)