[R] which function to use to do classification

Martin Maechler maechler at stat.math.ethz.ch
Wed Mar 29 09:14:06 CEST 2006


>>>>> "Baoqiang" == Baoqiang Cao <caobg at email.uc.edu>
>>>>>     on Wed, 29 Mar 2006 00:46:01 -0500 writes:

    Baoqiang> Thanks!
    Baoqiang> I tried kmeans, the results is not very positive. Anyway, thanks Jacques! Please let me know if you have any other thoughts!

My first recommendation would have been pam(),
but Jacques mentioned that as well.

HOWEVER note that many (unfortunately nowadays even "most") people doing
cluster analysis nowadays have forgotten (or never known) the
importance of the underlying "similarity" / "dissimilarity" / "distance"
which underlies almost all clustering methods
(see functions 'dist()' and also cluster::daisy().  
The choice of dissimilarity includes variable transformation,
selection, etc --- things which need thinking in addition to
software....

If you don't get "very positive" results it could well be that
you should start considering the above.

Martin Maechler, ETH Zurich


    Baoqiang> ======= At 2006-03-29, 00:08:44 you wrote: =======

    >> if you want to classify rows or columns, read:
    >> ?hclust
    >> ?kmeans
    >> library(cluster)
    >> ?pam
    >> 
    >> 
    >> Baoqiang Cao a $A(&(Bcrit :
    >> 
    >>> Dear All,
    >>> 
    >>> I have a data, suppose it is an N*M matrix data. All I want is to classify it into, let see, 3 classes. Which method(s) do you think is(are) appropriate for this purpose? Any reference will be welcome! Thanks!
    >>> 
    >>> Best, 
    >>> Baoqiang Cao




More information about the R-help mailing list