Re: Feature selection and K-means clustering



Function 'sequentialfs' can be applied for both supervised learning
algorithms (such as regression and classification algorithms) and
unsupervised algorithms (such as clustering algorithms). If you goal is to
select features for clustering algorithms, it's possible to use clustering
as the criterion function applied in 'sequentialfs'.



Note that there are two things which you may want to pay attention:

1. Performs clustering usually does not require having both training sets
and test set. You just run clustering algorithm on one data set and compute
the criterion value on the same data set. Therefore, the 'cv' option in
sequentialfs needs to be 'none'.

2. Function 'sequentialfs' chooses the candidate feature subset that
minimizes the criterion value. To use silhouette value, you need to take the
negative silhouette value.



Here is a simple example:

load fisheriris

X = randn(150,4);

X=[meas X]; %feature 5,6,7,8 are noisy features.

clustf=@(X)(-sum(silhouette(X,kmeans(X,3,'rep',10))));

[fs1,history1]=sequentialfs(clustf,X,'cv','none')

[fs2,history2]=sequentialfs(clustf,X,'cv','none','NFeatures',4)



Selecting features for clustering is not easy in general. The silhouette
value was originally proposed to find the number of clusters. It's hard to
say whether using the silhouette values from kmeans is good for selecting
features in your data. It's good to analysis the results with some domain
knowledge.



-Ting Su

The MathWorks, Inc.





<rhojjat@xxxxxxxxx> wrote in message
news:4b634591-d61a-4d15-9570-d4468963c562@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Hi,

Goal: classify normal and abnormal behavior

I have up to 36 features, and the extracted results are already
divided into training and test sets.
I've calculated the scatter matrix for each feature as a linear
separability factor. In order to find an indicator for the nonlinear
separability, I've been asked to apply K-means clustering. It seems
reasonable to use the silh-factor from "[silh,h] = silhouette
(X,icx,'sqeculid')" as an measurement for separability of the clusters
(I would like to hear a better solution, if you have one).

When it comes to feature selection, based on the results from
clustering, I am wondering about if it is possible to combine k-means
and sequentialfs. With another word, is it possible to use clustering
as the criterion function applied in sequentialfs?

feel free to e-mail me your respons (rhojjat-at-gmail.com)

Kind Regards
Parsa


.



Relevant Pages

  • Re: 2 Questions: Manova and Selecting features
    ... Computes a Multivariate Analysis of Variance for equal or unequal ... Statistical power of a performed single-factor MANOVA. ... Many clustering algorithms are ... Thus, although clustering algorithms are ...
    (comp.soft-sys.matlab)
  • Re: 2 Questions: Manova and Selecting features
    ... Also you should verify the following MANOVA files ... Computes a Multivariate Analysis of Variance for equal or unequal ... Many clustering algorithms are ... Thus, although clustering algorithms are ...
    (comp.soft-sys.matlab)
  • Re: assumption of Classification
    ... Most clustering algorithms (k-means, ... > clustering algorithms which assumes that each cluster is multivariate ... A classification tree that tries to break at every value ... normality is not the only assumption to be checked. ...
    (sci.stat.edu)
  • Re: datasets to test clustering algorithm
    ... > use in the comparison of a few clustering algorithms? ... see if you end up clustering books by the same author together, ... And, if your clustering algorithms get better results than Hoover got, ...
    (sci.stat.math)
  • Re: 2003 Cluster with 2000 Advanced Server node
    ... > 1 NT4 PDC ... promote the new server to PDC. ... I know that 2003 Clustering adds some new features and I ...
    (microsoft.public.win2000.advanced_server)

Loading