[R] Document clustering for R

Jari Oksanen jarioksa at sun3.oulu.fi
Tue Sep 13 15:30:22 CEST 2005


On Mon, 2005-09-12 at 12:47 -0700, Raymond K Pon wrote:
> I'm working on a project related to document clustering. I know that R 
> has clustering algorithms such as clara, but only supports two distance 
> metrics: euclidian and manhattan, which are not very useful for 
> clustering documents. I was wondering how easy it would be to extend the 
> clustering package in R to support other distance metrics, such as 
> cosine distance, or if there was an API for custom distance metrics.
> 
You don't have to extend the "clustering package in R to support other
distance metrics", but you should take care that you produce your
dissimilarities (or distances) in the standard format so that they can
be used in "clustering package" or in cmdscale or in isoMDS or any other
function excepting a "dist" object.  "Clustering package" will support
new dissimilarities if they were written in standard conforming way.
There are several packages that offer alternative dissimilarities (and
some even distances) that can be used in clustering functions. Look for
"distances" or "dissimilarities" in the R Site. Some of these could be
the one for you... I would be surprised if cosine index is missing (and
if needed, I could write it for you in C, but I don't think that is
necessary).

cheers, jari oksanen




More information about the R-help mailing list