[R] kmeans error (bug?)

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Nov 10 09:19:16 CET 2003


This is not a bug.  It just means that the algorithm sometimes finds an 
empty cluster, and as you asked for 34 clusters and it had 33 or less it 
stops.

What to do in this situation is currently under discussion, but the advice 
given is good: try another set of initial centres.

Please do read the description of a bug in the R FAQ, and do not misuse 
the term to mean `something I do not understand'.

On Mon, 10 Nov 2003, Murad Nayal wrote:

> I have been getting the following intermittent error from kmeans:
> 
> >str(cavint.p.r)
>  num [1:1967, 1:13] 0.691 0.123 0.388 0.268 0.485 ...
>  - attr(*, "dimnames")=List of 2
>   ..$ : chr [1:1967] "6" "49" "87" "102" ...
>   ..$ : chr [1:13] "HYD" "NEG" "POS" "OXY" ...
> > set.seed(34)
> > kmeans(cavint.p.r,centers=34)
> Error: empty cluster: try a better set of initial centers
> 
> the seed being equal to the number of centers in this case is just a
> coincidence. I've encountered the same error with or without setting the
> seed at different numbers of clusters.
> 
> there is nothing particularly unusual about cavint.p.r (no NAs, NULLs),
> except maybe for the fact that the rows sum to 1.
> 
> > sum(is.na(cavint.p.r))
> [1] 0
> > sum(is.nan(cavint.p.r))
> [1] 0
> > 
> 
> I thought kmeans should select initial centers from the data if not
> given explicitly! any idea what might be going wrong?

And what makes you think it did not?

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list