[R] custom metric for dist for use with hclust/kmeans

Greg Snow Greg.Snow at imail.org
Thu May 6 21:42:15 CEST 2010


The pam function in the cluster package accepts either raw data or a dissimilarity matrix and does the same idea as kmeans.  The daisy function has more options for creating the dissimilarity matrix, if what you want is not in there, you could still use it as a model for creating your own function.  You could also use the outer function and as.dist to create the distance matrix.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Vivek Ayer
> Sent: Thursday, May 06, 2010 9:56 AM
> To: r-help at r-project.org
> Subject: Re: [R] custom metric for dist for use with hclust/kmeans
> 
> Bump...no insights on defining custom metrics. Guess I'll give the
> other languages a shot.
> 
> Vivek
> 
> On Wed, May 5, 2010 at 10:13 AM, Vivek Ayer <vivek.ayer at gmail.com>
> wrote:
> > Hi guys,
> >
> > I've been using the kmeans and hclust functions for some time now and
> > was wondering if I could specify a custom metric when passing my data
> > frame into hclust as a distance matrix. Actually, kmeans doesn't even
> > take a distance matrix; it takes the data frame directly. I was
> > wondering if there's a way or if there's a package that lets you
> > create distance matrices from non-standard metrics, e.g.,
> > KL-divergence (which is not really one), but metrics used often in
> > information theory. I'm assuming kmeans just assumes the euclidean
> > metric, but it would be nice to customize that as well. Is stuff out
> > there, or would I have create my own?
> >
> > Thanks,
> > Vivek
> >
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list