[R] kmeans function

David Carlson dcarlson at tamu.edu
Wed Mar 26 22:14:32 CET 2014


To add to Ranjan's reply, k-means can potentially find different
results with large nstart= numbers in a large data set. But you
are correct, with a large enough value, the results will be the
same unless there are two solutions that have exactly the same
between sum of squares (unlikely but not impossible). However,
removing observations could easily change the results although
it may not in your data. If you are comparing to SAS PROC
FASTCLUS, the answer is that FASTCLUS does not appear to support
multiple starts. You would have to run FASTCLUS nstart times and
choose the result with the maximum between sum of squares to
match the results in R. 

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of Ranjan Maitra
Sent: Wednesday, March 26, 2014 2:48 PM
To: r-help at stat.math.ethz.ch
Subject: Re: [R] kmeans function

On Wed, 26 Mar 2014 18:35:34 +0000 "Tomassini, Letizia"
<tomassini at vetmed.wsu.edu> wrote:

> 
> Hello
> I need to ask questions about the k-means clustering function.
Mainly I would like to know why, with the use of nstart=enough
number of times, kmeans always finds the same clustering
arrangements; and this happens even when the input dataset is
sorted in different ways or I take out few observations. I
cannot seem to be able to recreate that when using SAS.

Do you understand what kmeans does? Why would you expect
otherwise?
Besides, why does the function ahve to match SAS's output? (Do
you
know how it goes about initializing the function in SAS?) In any
case, should it not be that it should provide the correct (best
global
minima, if possible) answer?

Ranjan

____________________________________________________________
FREE 3D EARTH SCREENSAVER - Watch the Earth right on your
desktop!

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.




More information about the R-help mailing list