[R] hierarchical clustering: stopping rule
Felix Salfner
salfner at informatik.hu-berlin.de
Wed Mar 24 16:49:34 CET 2004
I'm using 'agnes' from the 'cluster' package to cluster my data
hierarchically.
I need to find out the 'optimal' number of clusters.
In 'Finding Groups in Data: An Introduction to Cluster Analysis' Kaufman
and Rousseeuw refer to a strategy proposed by R. Mojena ('Hierarchical
grouping methods and stopping rules: An evaluation' (The Computer
Journal, 20(4), 1977).
Mojena describes group weighted average hierarchical clustering methods
with the following formula:
n_p n_q
d_is = ---- d_ps + ---- d_qs
n_i n_i
where i is the index for the new group to be formed out of groups p and q
and s represents a third group
d is the distance measure.
In every clustering step a_j = min_{i<m} (d_im)
My question now is:
are the values of agnes.object$heights identical to the a_j defined above?
(Despite of the fact that the heights are permutated for drawing)
I also read the publication of Lance and Williams who originally introduced the above notation but it didn't help ...
Thanks for any hint ...
Felix Salfner
More information about the R-help
mailing list