[R] Kmeans performance difference
Moisan Yves
ymoisan at groupesm.com
Wed Jul 4 20:58:15 CEST 2007
Hi All,
A question from a newbie using R 2-5-0 on windows XP. Why is it that
kmeans clustering with apparently the exact same parameters behaves so
differently between the two following examples :
> cl1 <- kmeans(subset(pointsUXO15555, select = c(2:4)), 10)
Takes about 2 seconds to deliver a result
> cl1 <- clust(subset(pointsUXO15555, select = c(2:4)), k=10,
method="kmeansHartigan")
Dies after about 10 minutes and fills up RAM :
*** running kmeansHartigan cluster algorithm...
*** calculating validity measure...
Erreur : impossible d'allouer un vecteur de taille 922.9 Mo
De plus : Warning messages:
1: Reached total allocation of 1023Mb: see help(memory.size)
2: Reached total allocation of 1023Mb: see help(memory.size)
3: Reached total allocation of 1023Mb: see help(memory.size)
4: Reached total allocation of 1023Mb: see help(memory.size)
If I understand correctly, both methods should give the sameish results
(modulo the initial random locations) since the default in kmeans is
"Hartigan-Wong". My data frame is 3 columns X 15555 lines. It must be
that kmeans is more a "core" R function whereas clust id from the
clustTool package, but isn't clustTool simply wrapping the core kmeans
method ? Why such a difference ?
TIA,
Yves Moisan
More information about the R-help
mailing list