[R] kmeans and incom,plete distance matrix concern

Gabor Grothendieck ggrothendieck at gmail.com
Mon Aug 7 17:43:33 CEST 2006


There are many clustering functions in R and R packages and some
take distance objects whereas others do not.  You likely read about
hclust or some different clustering function.  See ?kmeans for the
kmeans function and also look at the CRAN Task View on clustering for
other clustering functions:

  http://cran.r-project.org/src/contrib/Views/

On 8/7/06, Ffenics <ffenics2002 at yahoo.co.uk> wrote:
> well then i dont understand because everything i have read so far suggests that you use the dist() function to create a matrix based on the euclideam distance and then the kmeans() function.
>
> If this is incorrect, then any suggestins as to how to do this properly would be much appreciated.
>
> Christian Hennig <chrish at stats.ucl.ac.uk> wrote: First of all, kmeans doesn't work on distance matrices.
>
> On Mon, 7 Aug 2006, Ffenics wrote:
>
> > Hi there
> > I have been using R to perform kmeans on a dataset. The data is fed in using read.table and then a matrix (x) is created
> >
> > i.e:
> >
> > [
> > mat <- matrix(0, nlevels(DF$V1), nlevels(DF$V2),
> > dimnames = list(levels(DF$V1), levels(DF$V2)))
> > mat[cbind(DF$V1, DF$V2)] <- DF$V3
> > This matrix is then taken and a distance matrix (y) created using dist() before performing the kmeans clustering.
> >
> > My query is this: not all the data for the initial matrix (x) exists and therefore the matrix is not fully populated - empty cells are populated with '0's.
> >
> > Could someone please tell me how this may affect the result from the dist() command - because a '0' in a distance matrix means that the two variables are identical doesnt it(?) - but I dont want tthings clustered together simply because there was no information.
> >
> > Is this a problem and are there ways to circumnavigate them? Thanks
> >
> >  [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> *** --- ***
> Christian Hennig
> University College London, Department of Statistical Science
> Gower St., London WC1E 6BT, phone +44 207 679 1698
> chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list