Re: HELP!!!



On Nov 12, 4:05 pm, rober...@xxxxxxxxxxxxxxxxxx (Walter Roberson)
wrote:
In article <1194899672.262350.297...@xxxxxxxxxxxxxxxxxxxxxxxxxxx>,
Greg Heath <he...@xxxxxxxxxxxxxxxx> wrote:

In general, clustering a mixture of multiple class data
via unsupervised clustering yields a suboptimal cluster
based classifier. However, cluster based classification
can be improved, significantly, if supervised clustering
using class labels, is used.

*If*, that is, the class labels are correct. Which turns
out to be a problem in practice. It is unfortunately not "rare"
for us to receive datasets in which samples have been misclassified.

The "Gold Standard" is classification by a trained experienced human
expert, but even experts make mistakes or are mislead by the data
subset that they examine to classify by (e.g., the visual shape of a
cell). We have found that for some datasets, that our unsupervised
classification methods have an accuracy significantly exceeding the
"Gold Standard".

A related issue that we deal with a lot is that when the datasets
contain large amounts of data (e.g., most any of the modern medical
"scanners" such as CT, MRS, MRI, infra-red), humans have a lot of
difficulty in perceiving the abstract multidimensional patterns
needed in order to create class labels in the first place. Spectral
noise certainly doesn't help!

Supervised classification is great if you already know exactly
what you are looking for, but it is not very good at figuring out
new relationships. If you have your eye on peaks in the oxygen
flow, you are likely to completely miss the much better correlation
with (say) the calcium concentration information...
--

That is why I have always recommended (search on greg-heath
pretraining advice) that unsupervised methods such as unsupervised
clustering and principal component analysis be used, before
supervised learning, in order to torture the data until they confess.

Hope this helps.

Greg

.



Relevant Pages

  • Re: finding the centre of a cluster
    ... for classification problems involves ... predictor variables are continuous, ... Clustering is an unstructured problem in which you DON'T KNOW even ... I have more experience with neural networks and decision trees than you do. ...
    (sci.stat.math)
  • Re: assumption of Classification
    ... Most clustering algorithms (k-means, ... > A classification tree that tries to break at every value ... normality is not the only assumption to be checked. ...
    (sci.stat.edu)
  • Re: HELP!!!
    ... I need to perform some classification on the first raw of ... and perform the classification. ... clustering a mixture of multiple class data ... corresponding MATLAB code is readily available. ...
    (comp.soft-sys.matlab)
  • Re: Prediction in clustering
    ... methods will be superior to clustering forced into predictive mode. ... > I wanted to compare two approaches (one is based on classification ... > techniques and the other based on clustering techniques) for a specific ... > "prediction in clustering"). ...
    (sci.stat.consult)
  • Re: assumption of Classification
    ... Classification trees do not. ... Most clustering algorithms (k-means, ... single link, average link, etc.) do not. ... normality is not the only assumption to be checked. ...
    (sci.stat.edu)

Loading