[R] kmeans clustering

Mon Apr 14 13:04:20 CEST 2003

On Mon, 14 Apr 2003, pingzhao wrote:

> Hi,
> 
> I am using kmeans to cluster a dataset.
> I test this example:
> 
> > data<-matrix(scan("data100.txt"),100,37,byrow=T)
> (my dataset is 100 rows and 37 columns--clustering rows)
> 
>  > c1<-kmeans(data,3,20)
> > c1
> $cluster
>   [1] 1 1 1 1 1 1 1 3 3 3 1 3 1 3 3 1 1 1 1 3 1 3 3 1 1 1 3 3 1 1 3 1 1 1 1 3 
> 3
>  [38] 3 1 1 1 3 1 1 1 1 3 3 3 1 1 1 1 1 1 3 1 3 1 1 3 1 1 1 1 3 1 1 1 1 1 1 3 
> 1
>  [75] 1 3 1 3 1 1 1 1 3 1 1 1 1 1 3 1 1 3 1 1 3 3 1 2 1 1
> 
> $withinss
> [1] 1037.5987    0.0000  666.9701
> 
> $size
> [1] 68  1 31
> 
> > c4<-kmeans(data,3,20)
> $withinss
> [1]   0.0000 865.7628 851.1214
> 
> $size
> [1]  1 54 45
> 
> Does any one tell me why the results are very different with the same 
> dataset and parameters when I run some times this command 
> 'kmeans(data,3,20)'???

The help page could tell  you:

 centers: Either the number of clusters or a set of initial cluster
          centers. If the first, a random set of rows in `x' are chosen
          as the initial centers. 

At the very least, the labellings of the clusters are arbitrary, but 
K-means usually has multiple local minima.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595