[R] Dynamic clustering?

Nordlund, Dan (DSHS/RDA) NordlDJ at dshs.wa.gov
Wed May 5 23:52:58 CEST 2010


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Erik Iverson
> Sent: Wednesday, May 05, 2010 2:33 PM
> To: Ralf B
> Cc: r-help at r-project.org
> Subject: Re: [R] Dynamic clustering?
> 
> Hello,
> 
> Ralf B wrote:
> > Are there R packages that allow for dynamic clustering, i.e. where
> the
> > number of clusters are not predefined? I have a list of numbers that
> > falls in either 2 or just 1 cluster. Here an example of one that
> > should be clustered into two clusters:
> >
> > two <- c(1,2,3,2,3,1,2,3,400,300,400)
> >
> > and here one that only contains one cluster and would therefore not
> > need to be clustered at all.
> >
> > one <- c(400,402,405, 401,410,415, 407,412)
> >
> > Given a sufficiently large amount of data, a statistical test or an
> > effect size should be able to determined if a data set makes sense to
> > be divided i.e. if there are two groups that differ well enough. I am
> > not familiar with the underlying techniques in kmeans, but I know
> that
> > it blindly divides both data sets based on the predefined number of
> > clusters. Are there any more sophisticated methods that allow me to
> > determine the number of clusters in a data set based on statistical
> > tests or effect sizes ?
> 
<<<snip>>>

Ralf,

There is no procedure in R or any other stat package that can make these kinds of decisions without a whole lot more specification of the problem.  You give two examples above.  What would you want done with 

c(380, 400, 402, 405, 401, 410, 415, 407, 412), or
c(350, 400, 402, 405, 401, 410, 415, 407, 412), or
c(300, 400, 402, 405, 401, 410, 415, 407, 412), or
c(100, 400, 402, 405, 401, 410, 415, 407, 412), or
...

i.e. what difference counts as big enough or variable enough or ...? 

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204




More information about the R-help mailing list