[R] kmeans function

Tomassini, Letizia tomassini at vetmed.wsu.edu
Thu Mar 27 17:18:52 CET 2014

I have tried to read the source code but since I am not a computer engineer nor a computer programmer I was not able to fully understand it. I wonder if I should look for somebody here on campus (Washington State University) who may be able to read it for me. In any case, I think that David Carlson did a nice job explaining what the R function is doing.
I can ask to SAS help as well for the SAS part. That is good idea!

Da: Prof Brian Ripley [ripley at stats.ox.ac.uk]
Inviato: mercoledì 26 marzo 2014 23.45
A: Tomassini, Letizia; r-help at stat.math.ethz.ch
Oggetto: Re: [R] kmeans function

On 26/03/2014 20:01, Tomassini, Letizia wrote:
> I would like to understand why the fastclus procedure in SAS is affected by the initial order of the data. So, with the same dataset, but sorted in a different way, I get different clusters rearrangements. I find this really disturbing. R seems to find the stable solution with the use of nstart=100 but I do not know how R does this and I do not know how to replicate this in SAS. All I know so far is that proc fastclus uses k-means as well.
> Regarding R, for example, does the R software have a way of choosing always the same starting seeds? Does it reorganize the dataset according to an internal way of sorting the data before running kmeans?
> I am interested in finding clusters with the best global minima and extract the seeds out of those. I need those seeds for following clustering number solutions (for example decide for lower number of clusters and use specific seeds). Overall I am better at using SAS, and I am trying to learn this piece of clustering design information from R to implement that in SAS.
> Please let me know if you can help

We (unlike SAS) provide you with source code, which is the definitive
documentation.  Please read it: it answers all your questions.  (Even
those who contributed to the implementation of kmeans would need to do
so to refresh their memories.)

As for why a SAS algorithm works the way it does: given the fees someone
is paying SAS on your behalf they should be willing to explain.

> Letizia
> ________________________________________
> Da: r-help-bounces at r-project.org [r-help-bounces at r-project.org] per conto di Ranjan Maitra [maitra.mbox.ignored at inbox.com]
> Inviato: mercoledì 26 marzo 2014 12.48
> A: r-help at stat.math.ethz.ch
> Oggetto: Re: [R] kmeans function
> On Wed, 26 Mar 2014 18:35:34 +0000 "Tomassini, Letizia"
> <tomassini at vetmed.wsu.edu> wrote:
>> Hello
>> I need to ask questions about the k-means clustering function. Mainly I would like to know why, with the use of nstart=enough number of times, kmeans always finds the same clustering arrangements; and this happens even when the input dataset is sorted in different ways or I take out few observations. I cannot seem to be able to recreate that when using SAS.
> Do you understand what kmeans does? Why would you expect otherwise?
> Besides, why does the function ahve to match SAS's output? (Do you
> know how it goes about initializing the function in SAS?) In any
> case, should it not be that it should provide the correct (best global
> minima, if possible) answer?
> Ranjan

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

More information about the R-help mailing list