[R] Elbow criterion

Matt Shotwell matt.shotwell at Vanderbilt.Edu
Mon Jun 20 16:05:08 CEST 2011


On Mon, 2011-06-20 at 13:38 +0200, Dominik P.H. Kalisch wrote:
> Hi,
> 
> I would like to cluster a dataset with the ward algorithm.

I'm assuming that this refers to the agglomerative partitioning method
[1]. That is, the number of clusters is selected according to the data
partition that is sequentially optimal with respect to an `objective
function'. In order to apply the elbow criterion, it should be possible
to optimize over subsets of all possible data partitions where the
number of clusters is fixed.

Although the Ward method yields a sequence of data partitions with
decreasing cluster sizes, there is no guarantee that _any_ of these
partitions are optimal (except sequentially, of course). To apply the
elbow method post hoc seems dubious, but maybe no more so than the Ward
method itself.

There are clustering methods that optimize the data partition (w.r.t a
likelihood/posterior) with a fixed number of clusters, for instance,
those based on finite mixture models. The elbow principle and method
seem more valid in this context. See the R package 'mclust', and the
CRAN task view for cluster analysis:

http://cran.r-project.org/web/views/Cluster.html

> That works fine. But I can't find a method to plot the structure chart 
> to estimate the "elbow crterion" for the number of clusters.
> Can someone tell me how I can do it?
> 
> Thanks for your help.
> Dominik
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[1] Ward, J. H. (1963), “Hierarchical Grouping to Optimize an Objective
Function,” Journal of the American Statistical Association, 58, 236–244.

-- 
Matthew S. Shotwell
Assistant Professor, Department of Biostatistics
School of Medicine, Vanderbilt University
1161 21st Ave. S2323 MCN Office CC2102L
Nashville, TN 37232-2158



More information about the R-help mailing list