Re: Feature selection and K-means clustering
- From: "Ting Su" <Ting.Su@xxxxxxxxxxxxx>
- Date: Wed, 25 Mar 2009 16:49:52 -0400
Function 'sequentialfs' can be applied for both supervised learning
algorithms (such as regression and classification algorithms) and
unsupervised algorithms (such as clustering algorithms). If you goal is to
select features for clustering algorithms, it's possible to use clustering
as the criterion function applied in 'sequentialfs'.
Note that there are two things which you may want to pay attention:
1. Performs clustering usually does not require having both training sets
and test set. You just run clustering algorithm on one data set and compute
the criterion value on the same data set. Therefore, the 'cv' option in
sequentialfs needs to be 'none'.
2. Function 'sequentialfs' chooses the candidate feature subset that
minimizes the criterion value. To use silhouette value, you need to take the
negative silhouette value.
Here is a simple example:
load fisheriris
X = randn(150,4);
X=[meas X]; %feature 5,6,7,8 are noisy features.
clustf=@(X)(-sum(silhouette(X,kmeans(X,3,'rep',10))));
[fs1,history1]=sequentialfs(clustf,X,'cv','none')
[fs2,history2]=sequentialfs(clustf,X,'cv','none','NFeatures',4)
Selecting features for clustering is not easy in general. The silhouette
value was originally proposed to find the number of clusters. It's hard to
say whether using the silhouette values from kmeans is good for selecting
features in your data. It's good to analysis the results with some domain
knowledge.
-Ting Su
The MathWorks, Inc.
<rhojjat@xxxxxxxxx> wrote in message
news:4b634591-d61a-4d15-9570-d4468963c562@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Hi,
Goal: classify normal and abnormal behavior
I have up to 36 features, and the extracted results are already
divided into training and test sets.
I've calculated the scatter matrix for each feature as a linear
separability factor. In order to find an indicator for the nonlinear
separability, I've been asked to apply K-means clustering. It seems
reasonable to use the silh-factor from "[silh,h] = silhouette
(X,icx,'sqeculid')" as an measurement for separability of the clusters
(I would like to hear a better solution, if you have one).
When it comes to feature selection, based on the results from
clustering, I am wondering about if it is possible to combine k-means
and sequentialfs. With another word, is it possible to use clustering
as the criterion function applied in sequentialfs?
feel free to e-mail me your respons (rhojjat-at-gmail.com)
Kind Regards
Parsa
.
- Follow-Ups:
- Re: Feature selection and K-means clustering
- From: rhojjat
- Re: Feature selection and K-means clustering
- References:
- Feature selection and K-means clustering
- From: rhojjat
- Feature selection and K-means clustering
- Prev by Date: Re: How to create a p-file from a m-file?
- Next by Date: Re: How to create a p-file from a m-file?
- Previous by thread: Re: Feature selection and K-means clustering
- Next by thread: Re: Feature selection and K-means clustering
- Index(es):
Relevant Pages
|
Loading