[R] Empty cluster / segfault using vanilla kmeans with version 2.15.2

Dr. Detlef Groth groth at mpimp-golm.mpg.de
Wed Mar 13 13:45:09 CET 2013


Hello,

here is a working reproducible example which crashes R using kmeans or 
gives empty clusters using the nstart option with R 15.2.


library(cluster)
kmeans(ruspini,4)
kmeans(ruspini,4,nstart=2)
kmeans(ruspini,4,nstart=4)
kmeans(ruspini,4,nstart=10)
?kmeans

either we got empty always clusters and or, after some further commands 
an segfault.

regards,
Detlef Groth

------------


[R] Empty cluster / segfault using vanilla kmeans with version 2.15.2
Uwe Ligges ligges at statistik.tu-dortmund.de
Sat Feb 9 20:52:19 CET 2013

     Previous message: [R] Empty cluster / segfault using vanilla kmeans 
with version 2.15.2
     Next message: [R] Fractional logit in GLM?
     Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

We need a reproducible example.

Uwe Ligges


On 03.02.2013 15:03, Luca Nanetti wrote:
> Dear experts,
> I am encountering a version-dependent issue.
>
> My laptop runs Ubuntu 12.04 LTS 64-bit, R 2.14.1; the issue explained below
> never occurred with this version of R
> My desktop runs Ubuntu 11.10 64-bit, R 2.13.2; what follows applies to this
> setup.
>
> The data I'm clustering is constituted by the rows of a 320 x 6 matrix
> containing integers ranging from 1 to 7, no missing data.
> I applied kmeans() to this matrix, literally, 256 x 10⁶ times using R
> version 2.13.2 or 2.14.1, without never experiencing the slightest problem.
> My usual setup is with k=5, nstart=256, iter.max=50.
>
> Upgrading to R 2.15.2, I experienced either a warning message ('Empty
> cluster. Choose a better set of initial centers') or a catastrophic
> segfault. The only way I can get a solution whatsoever is putting nstart to
> its default value, i.e. 1. However, just repeating the clustering, the same
> issue still happen. Moreover, this is vastly suboptimal, because the risk
> of local minima.
>
> Something similar was reported many years ago, see
> https://stat.ethz.ch/pipermail/r-help/2003-November/041784.html. It was
> then suggested that R's behaviour was correct. I'm not familiar with such
> an early R version, but the up-to-date documentation of kmeans clearly
> states that "Except for the Lloyd-Forgy method, k clusters will always be
> returned if a number is specified.".
> I am using the default Hartigan-Wong, and I specify an exact number k:
> thus, k clusters should be returned. They aren't, and the empty cluster is
> then more likely the symptom of a bug rather than the outcome of a 'true'
> local minimum.
>
> Using synaptic, I managed to downgrade R to version 2.13.2. The problem
> disappeard, i.e. the previous message/segfault didn't occur anymore.
>
> Summarizing: given the same dataset, either an unreasonable message or a
> segfault regularly happen in version 2.15.2 by invoking kmeans() on an
> Ubuntu 11.10 64bit machine. This does not happen at all in previous
> versions of R, on the same machine and operating system.
>
> I respectfully suggest that the behaviour shown in the aforementioned
> versions 2.13.2 and 2.14.1 should be considered 'normal', and that version
> 2.15.2 should revert to that.
>
> Kind regards,
> Luca Nanetti.
>
> 	[[alternative HTML version deleted]]



More information about the R-help mailing list