[R] hierarchical clustering: stopping rule

Felix Salfner salfner at informatik.hu-berlin.de
Wed Mar 24 16:49:34 CET 2004


I'm using 'agnes' from the 'cluster' package to cluster my data 
hierarchically.

I need to find out the 'optimal' number of clusters.

In 'Finding Groups in Data: An Introduction to Cluster Analysis' Kaufman 
and Rousseeuw refer to a strategy proposed by R. Mojena ('Hierarchical 
grouping methods and stopping rules: An evaluation' (The Computer 
Journal, 20(4), 1977).

Mojena describes group weighted average hierarchical clustering methods 
with the following formula:

        n_p          n_q
d_is = ---- d_ps  + ---- d_qs
        n_i          n_i


where i is the index for the new group to be formed out of groups p and q
and s represents a third group
d is the distance measure.

In every clustering step   a_j = min_{i<m} (d_im)


My question now is:

are the values of agnes.object$heights identical to the a_j defined above? 
(Despite of the fact that the heights are permutated for drawing)

I also read the publication of Lance and Williams who originally introduced the above notation but it didn't help ...

Thanks for any hint ...

Felix Salfner




More information about the R-help mailing list