[R] cluster size
Christian Hennig
chrish at stats.ucl.ac.uk
Fri Dec 11 16:47:13 CET 2009
Dear Ms Karunambigai,
the kmeans algorithm depends on random initialisation.
There are two basic strategies that can be applied in order to make your
results reproducible:
1) Fix the random number generator by means of set.seed (see ?set.seed)
before you run kmeans. The problem with this is that your solution can
only be reproduced using the same random seed; it technically still is
random.
2) Specify fixed initial centers, using the centers argument in kmeans.
(Sensible initial centers may be obtained by running hclust using Ward's
method, obtain the desired number of clusters using cutree and compute the
centers of the resulting clusters; sorry that I
don't have the time right now to explain how to do that precisely; the
help pages and hopefully some understanding of what is going on may help
you further.)
An alternative strategy that will not absolutely guarantee reproducibility
but make your results more stable is to use kmeansruns in library fpc, which
is a wrapper that runs kmeans several times and gives you the optimal
solution. That should reproduce its outcome with higher probability
(though not precisely 1).
I don't know whether the default value runs=100 is sufficient to give a
stable solution for your data, but increasing the runs parameter may help.
Cheers,
Christian
On Fri, 11 Dec 2009, karuna m wrote:
> hi r-help,
> i am doing kmeans clustering in stats. i tried for five clusters clustering using:
> kcl1 <- kmeans(as1[,c("contlife","somlife","agglife","sexlife",
> "rellife","hordlife","doutlife","symtlife","washlife",
> "chcklife","rptlife","countlife","coltlife","ordlife")], 5, iter.max = 10, nstart = 1,
> algorithm = "Hartigan-Wong")
> table(kcl1$cluster)
> every time i am getting five clusters of different sizes like first time with cluster sizes
> table(kcl1$cluster)
> 1 2 3 4 5
> 140 72 105 98 112
> second time with cluster sizes
> table(kcl1$cluster)
> 1 2 3 4 5
> 91 149 106 76 105 and so on.
> I wish to know that whether there is any function to get same sizes of clusters everytime when we do kmeans clustering.
> Thanks in advance.
> regards,
> Ms.Karunambigai M
> PhD Scholar
> Dept. of Biostatistics
> NIMHANS
> Bangalore
> India
>
>
> The INTERNET now has a personality. YOURS! See your Yahoo! Homepage.
> [[alternative HTML version deleted]]
>
>
*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche
More information about the R-help
mailing list