[R] kmeans (again)
    Liaw, Andy 
    andy_liaw at merck.com
       
    Fri Jun  6 04:19:35 CEST 2003
    
    
  
Just because you get the same answer from different starting points doesn't
mean the algorithm isn't using the starting points you specified.
I tried:
> set.seed(1)
> x <- matrix(rnorm(12), 6, 2)
> kmeans(x, x[c(1,6),], 1)
$cluster
[1] 2 1 2 1 1 2
$centers
        [,1]      [,2]
1  0.7028106 0.6482392
2 -0.7608503 0.4843512
$withinss
[1] 2.86861843 0.04450923
$size
[1] 3 3
> kmeans(x, 2, 1)
$cluster
[1] 2 1 2 1 1 2
$centers
        [,1]      [,2]
1  0.7028106 0.6482392
2 -0.7608503 0.4843512
$withinss
[1] 2.86861843 0.04450923
$size
[1] 3 3
> kmeans(x, x[c(3,4),], 1)
$cluster
[1] 1 1 1 2 1 1
$centers
        [,1]       [,2]
1 -0.3538799  0.7406319
2  1.5952808 -0.3053884
$withinss
[1] 2.089050 0.000000
$size
[1] 5 1
which shows that the result *can* depend on the starting values.
Andy
> -----Original Message-----
> From: Luis Torgo [mailto:ltorgo at liacc.up.pt]
> Sent: Thursday, June 05, 2003 2:05 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] kmeans (again)
> 
> 
> Regarding a previous question concerning the kmeans function 
> I've tried the 
> same example and I also get a strange result (at least 
> according to what is 
> said in the help of the function kmeans). Apparently, the function is 
> disregarding the initial cluster centers one gives it. 
> According to the help 
> of the function:
> 
>  centers: Either the number of clusters or a set of initial cluster
>           centers...
> 
> Now a small dataset:
> > data<-matrix(c(-1,0,2,2.5,7,9,0,3,0,6,1,4),6,2)
> 
> If I use rows 3 and 4 as cluster centers and a single 
> iteration of kmeans I 
> get:
> > kmeans(data,data[c(3,4),],1)
> $cluster
> [1] 1 1 1 1 2 2
> 
> $centers
>    [,1] [,2]
> 1 0.875 2.25
> 2 8.000 2.50
> 
> $withinss
> [1] 32.9375  6.5000
> 
> $size
> [1] 4 2
> 
> If I now use rows 1 and 6 as cluster centers I get exactly 
> the same solution 
> after the first iteration:
> 
> > kmeans(data,data[c(1,6),],1)
> $cluster
> [1] 1 1 1 1 2 2
> 
> $centers
>    [,1] [,2]
> 1 0.875 2.25
> 2 8.000 2.50
> 
> $withinss
> [1] 32.9375  6.5000
> 
> $size
> [1] 4 2
> 
> So, apparently the function is disregarding the initial 
> cluster centers 
> information. This is even "confirmed" by the fact that if I 
> use the function 
> without cluster centers, simply given the number of clusters, 
> I get the same 
> solution:
> > kmeans(data,2,1)
> $cluster
> [1] 2 2 2 2 1 1
> 
> $centers
>    [,1] [,2]
> 1 8.000 2.50
> 2 0.875 2.25
> 
> $withinss
> [1]  6.5000 32.9375
> 
> $size
> [1] 2 4
> 
> 
> 
> -- 
> Luis Torgo
>     FEP/LIACC, University of Porto   Phone : (+351) 22 607 88 30
>     Machine Learning Group           Fax   : (+351) 22 600 36 54
>     R. Campo Alegre, 823             email : ltorgo at liacc.up.pt
>     4150 PORTO   -  PORTUGAL         WWW   : 
> http://www.liacc.up.pt/~ltorgo
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> 
------------------------------------------------------------------------------
Notice: This e-mail message, together with any attachments, cont... {{dropped}}
    
    
More information about the R-help
mailing list