Re: Possible to Find the Clusters One by One??
- From: "Milind" <milind.a.joshi@xxxxxxxxx>
- Date: Sun, 23 Jul 2006 09:28:54 GMT
MrElbi wrote:
mathlover wrote:
Ted Dunning wrote:
mathlover wrote:
All the usual clustering methods I am aware of (such as k-means, Fuzzy
k-means, etc.) find all the clusters together. I mean, as you know,
they consider some cost function minimizing which (by an iterative
method for instance) finds all the clusters.
As a trivial answer, you can just run any of the clustering methods
that you are familiar with with a progressively larger number of
clusters using the old clusters as approximate seeds for the next set.
This will have some strange properties but will give approximately the
behavior you seek. Since clustering is generally O(m n), this is
reasonably efficient.
More interestingly, there are heirarchical clustering algorithms that
divide the data set into progressively finer divisions. Look for
dendograms, for instance.
This not really what you are asking for, however, since the neither of
these really find a single cluster in the sense you are asking.
If you know something about the distribution of the data that you are
looking at, you might be able to define something better. For
instance, you can define a cost function such that it must describe a
membership rule for a subset of your data and the distribution of
instances in that subset. It would be best fi the membership function
is based on a distance formulation. You would need to add cost for
poor description of the distribution of the data in the subset and for
the number of data instances outside the subset (or, inversely, use the
average log of the probability of the members of the subset plus the
size of the subset). There is probably an interesting mixture model
that could be used for this, but the key thing is that can't penalize
having a small subset too heavily, nor should you penalize a model for
an inability to describe the distribution of the excluded instances.
You can then progressively add subsets.
This is much closer to what you are looking for, but I have no idea if
it will actually work well on your data.
Dear Ted Dunning,
I think I might have badly phrased my question. Indeed, in the problem
I am working on simple k-means clustering attains satisfying quality.
Of course, I know the total number of clusters (so I can use the
k-means clustering).
However, because of the very large size of the problem it takes a lot
of time to find all the clusters (I mean using k-means). But, the point
is that I don't need all the clusters the k-means finds in the later
stages of my problem. Actually, I even know exactly how many of them I
need.
Thus, I am looking for a method that gives in its output clusters near
those k-means gives, but it can find them one at a time (not like
k-means that finds all of them together) so that I can quit after
finding the exact number of clusters I need.
Best regards,
mathlover.
Hi,
You should search informations about the BFR algorithm. It is a
partitional algorithm which is used to do clustering on very huge
datasets: the datas are treated by pages, which seems to be close to
what you are looking for.
Regards,
What about running the clustering algorithm on a small section of your
data set partitioned randomly into 'n' pieces, where 'n' is a size
chosen based on your resource constraints?
Regards,
Milind Joshi
[ comp.ai is moderated ... your article may take a while to appear. ]
.
- References:
- Possible to Find the Clusters One by One??
- From: mathlover
- Re: Possible to Find the Clusters One by One??
- From: MrElbi
- Possible to Find the Clusters One by One??
- Prev by Date: Re: Distance between two instances?
- Next by Date: Re: Possible to Find the Clusters One by One??
- Previous by thread: Re: Possible to Find the Clusters One by One??
- Next by thread: Re: Possible to Find the Clusters One by One??
- Index(es):
Relevant Pages
|