[R] which function to use to do classification

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Mar 29 13:43:52 CEST 2006


On Wed, 29 Mar 2006, Sean Davis wrote:

> We have to be careful here.  Classification (which is the terminology that
> the original poster used) is NOT the same as clustering, although the two
> are often confused.

Well, in one of its two English senses it is the same.  From a recent talk 
of mine (GfKL30), quoting the Concise Oxford Dictionary:

\emph{Classification} has two senses:

\begin{itemize}
\item `to arrange in classes or categories'
\item `assign (a thing) to a class or category'
\end{itemize}

There is a community (q.v. the International Federation of Classification 
Societies and Journal of Classification as well as the entry in the 
original Encyclopedia of Statistical Sciences) that meams (almost) 
entirely the first sense.

To add to this, the similar words to classification in e.g. French or 
German have (I am told) different shades of meaning.


> If the original poster wants to do clustering and
> examine the results for the presence of three clusters, that is fine and
> there are many methods for clustering that could be used.  However,
> classification will require a different set of tools.  If the clustering
> tools already pointed out are not doing what is needed (that is, that Cao
> actually is interested in clustering and not classification), then perhaps a
> further explanation of what the problem would help clarify.

Yes, further explanation would help.

> Sean
>
>
> On 3/29/06 1:46 AM, "Jacques VESLOT" <jacques.veslot at cirad.fr> wrote:
>
>> try this (suppose mat is your matrix):
>>
>> hc <- hclust(dist(mat,"manhattan"), "ward")
>> plot(hc, hang=-1)
>> (x <- identify(hc)) # rightclick to stop
>> cutree(hc, 3)
>>
>> km<- kmeans(mat, 3)
>> km$cluster
>> km$centers
>>
>> pam(daisy(mat, metric = "manhattan"), k=3, diss=T)$clust
>>
>>
>>
>> Baoqiang Cao a écrit :
>>
>>> Thanks!
>>> I tried kmeans, the results is not very positive. Anyway, thanks Jacques!
>>> Please let me know if you have any other thoughts!
>>>
>>> Best regards,
>>>    Baoqiang Cao
>>>
>>> ======= At 2006-03-29, 00:08:44 you wrote: =======
>>>
>>>
>>>
>>>> if you want to classify rows or columns, read:
>>>> ?hclust
>>>> ?kmeans
>>>> library(cluster)
>>>> ?pam
>>>>
>>>>
>>>> Baoqiang Cao a écrit :
>>>>
>>>>
>>>>
>>>>> Dear All,
>>>>>
>>>>> I have a data, suppose it is an N*M matrix data. All I want is to classify
>>>>> it into, let see, 3 classes. Which method(s) do you think is(are)
>>>>> appropriate for this purpose? Any reference will be welcome! Thanks!
>>>>>
>>>>> Best,
>>>>> Baoqiang Cao
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> ______________________________________________
>>>>> R-help at stat.math.ethz.ch mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide!
>>>>> http://www.R-project.org/posting-guide.html
>>>>>
>>>>>
>>>>>
>>>> .
>>>>
>>>>
>>>
>>> = = = = = = = = = = = = = = = = = = = =
>>>
>>> Baoqiang Cao
>>> caobg at email.uc.edu
>>> 2006-03-29
>>>
>>>
>>>
>>>
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


More information about the R-help mailing list