Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Mar 29 13:43:52 CEST 2006

On Wed, 29 Mar 2006, Sean Davis wrote:

> We have to be careful here.  Classification (which is the terminology that
> the original poster used) is NOT the same as clustering, although the two
> are often confused.

Well, in one of its two English senses it is the same.  From a recent talk
of mine (GfKL30), quoting the Concise Oxford Dictionary:

\emph{Classification} has two senses:

\begin{itemize}
\item to arrange in classes or categories'
\item assign (a thing) to a class or category'
\end{itemize}

There is a community (q.v. the International Federation of Classification
Societies and Journal of Classification as well as the entry in the
original Encyclopedia of Statistical Sciences) that meams (almost)
entirely the first sense.

To add to this, the similar words to classification in e.g. French or
German have (I am told) different shades of meaning.

> If the original poster wants to do clustering and
> examine the results for the presence of three clusters, that is fine and
> there are many methods for clustering that could be used.  However,
> classification will require a different set of tools.  If the clustering
> tools already pointed out are not doing what is needed (that is, that Cao
> actually is interested in clustering and not classification), then perhaps a
> further explanation of what the problem would help clarify.

Yes, further explanation would help.

>> try this (suppose mat is your matrix):
>>
>> hc <- hclust(dist(mat,"manhattan"), "ward")
>> plot(hc, hang=-1)
>> (x <- identify(hc)) # rightclick to stop
>> cutree(hc, 3)
>>
>> km<- kmeans(mat, 3)
>> km$cluster >> km$centers
>> pam(daisy(mat, metric = "manhattan"), k=3, diss=T)\$clust
>> Baoqiang Cao a écrit :
>>
>>> Thanks!
>>> I tried kmeans, the results is not very positive. Anyway, thanks Jacques!
>>> Please let me know if you have any other thoughts!
>>> Best regards,
>>>    Baoqiang Cao
>>>
>>>> if you want to classify rows or columns, read:
>>>> ?hclust
>>>> ?kmeans
>>>> library(cluster)
>>>> ?pam
>>>> Baoqiang Cao a écrit :
>>>>
>>>>> Dear All,
>>>>>
>>>>> I have a data, suppose it is an N*M matrix data. All I want is to classify
>>>>> it into, let see, 3 classes. Which method(s) do you think is(are)
>>>>> appropriate for this purpose? Any reference will be welcome! Thanks!
>>>>> Baoqiang Cao
>>>>>
>>> Baoqiang Cao
>>> caobg at email.uc.edu
>>> 2006-03-29
