Re: Finding similar entries
- From: "Pekka " <pekka.nospam.kumpulainen@xxxxxxxxxxxxx>
- Date: Tue, 26 Feb 2008 18:00:22 +0000 (UTC)
"John D'Errico" <woodchips@xxxxxxxxxxxxxxxx> wrote in
message <fq17qb$2b2$1@xxxxxxxxxxxxxxxxxx>...
"Daniel " <daniel4738@xxxxxxxxxxx> wrote in messageIt's
<fq0ple$qos$1@xxxxxxxxxxxxxxxxxx>...
I have a problem I can't seem to find the solution to.
variables.relatively easy.
I have a collection of 25000 observations of 8
to
I want to find entries which are similar to each other.
There must be an easy way, can someone perhaps suggest
something?
i.e. each entry is a galaxy with 8 parameters, I want
select.find a galaxy which has similar properties to one I
The simple solution is to compute an interpoint
distance matrix. There are several such tools on
the file exchange, or use pdist from the stats TB.
But these will fail on a 25000 point set.
I've written a code that allows you to find only
those distances below some limit, or only the
single nearest neighbor. I'd been planning on
putting it on the file exchange when I got a
round tuit. I'll do so today. E-mail me if you
want it sooner.
John
If you only need the similarity to the one you have
selected, then you don't need the interpoint distance
matrix. Distance to the selected one should be enough:
If x is the 25000 by 8 data and myx is the selected (1 by 8)
dist = bsxfun(@minus,x,myx);
% euclidean distance for example
Ed = sqrt(sum(dist.^2,2));
Then it is up to you to select what is close enough for
you, [sEd,ind] = sort(Ed); and pick the small ones..
Another solution is to do k-means clustering, no need to
the huge interpoint distance matrix. Included in statistics
toolbox. If you don't have that there is k-means available
also for free at least in SOM toolbox
www.cis.hut.fi/projects/somtoolbox/
But even if two points are in same cluster, they are not
necessarily very similar, you would still need to calculate
the similarity somehow, so I would go for direct distance
measure..
.
- Follow-Ups:
- Re: Finding similar entries
- From: John D'Errico
- Re: Finding similar entries
- References:
- Finding similar entries
- From: Daniel
- Re: Finding similar entries
- From: John D'Errico
- Finding similar entries
- Prev by Date: Re: how increase size of matrix by putting 0
- Next by Date: Re: ploting vector with nan values
- Previous by thread: Re: Finding similar entries
- Next by thread: Re: Finding similar entries
- Index(es):
Relevant Pages
|