[R] Dynamic clustering?
Erik Iverson
eriki at ccbr.umn.edu
Wed May 5 23:32:46 CEST 2010
Hello,
Ralf B wrote:
> Are there R packages that allow for dynamic clustering, i.e. where the
> number of clusters are not predefined? I have a list of numbers that
> falls in either 2 or just 1 cluster. Here an example of one that
> should be clustered into two clusters:
>
> two <- c(1,2,3,2,3,1,2,3,400,300,400)
>
> and here one that only contains one cluster and would therefore not
> need to be clustered at all.
>
> one <- c(400,402,405, 401,410,415, 407,412)
>
> Given a sufficiently large amount of data, a statistical test or an
> effect size should be able to determined if a data set makes sense to
> be divided i.e. if there are two groups that differ well enough. I am
> not familiar with the underlying techniques in kmeans, but I know that
> it blindly divides both data sets based on the predefined number of
> clusters. Are there any more sophisticated methods that allow me to
> determine the number of clusters in a data set based on statistical
> tests or effect sizes ?
Caveat: I have very little experience with clustering methods, but maybe
this could get you started:
http://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set
If you only want to make 2 clusters when the means of the data are an
order of magnitude apart or more, that's easy enough to do without a
statistical test.
For your examples above, I naively tried some functions in the mclust
package, which I've never used before:
mclustModel(one, (mclustBIC(one, G=1:2)))$G # gives 1
mclustModel(two, (mclustBIC(two, G=1:2)))$G # gives 2
You'll have to decide for yourself to determine if this is appropriate
for your data...or if I'm even using these functions correctly. :)
More information about the R-help
mailing list