[R] finding a stable cluster for kmeans

Wed Sep 26 15:54:33 CEST 2007

On Wed, 26 Sep 2007, Friedrich Leisch wrote:

>>>>>> On Tue, 25 Sep 2007 20:16:05 -0400,
>>>>>> Wiebke Timm (WT) wrote:
>
>  > You might want to check if there is a neural gas algorithm in R.
>  > kmeans generally has a high variance since it is very dependent on
>  > the initialization. Neural gas overcomes this problem by using a
>  > ranked list of neighbouring data points instead using data points
>  > directly. It is more stable (at the cost of additional computational
>  > time).
>
> Neural gas is in package flexclust on CRAN (one of the clustering
> methods function cclust() privides).
>
> I also find it more stable than kmeans for some data, although in
> general I agree with what has been said before in this thread:
> instability is in most cases caused by no clear cluster structure of
> the data, wrong number of clusters etc rather than by the wrong
> cluster algorithm.

I don't understand this use of 'high variance' and 'stable'.  K-means is a 
clearly defined criterion (rare in the clustering field) and so the 
outcome does not depend on the initialization.  Maybe different runs of 
kmeans() give different clusters, but in that case the algorithm is not 
optimizing the criterion in some (and maybe all) cases.  ?kmeans clearly 
says

      The Hartigan-Wong algorithm
      generally does a better job than either of those, but trying
      several random starts is often recommended.

And that there are several clusterings with roughly equally good fit would 
indicate that none of them is a uniquely good summary of the data.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595