[R-sig-eco] Do we have to cut a dendrogramm at a specific level or not

Tyler Smith tyler.smith at eku.edu
Fri Mar 18 17:08:56 CET 2011


Jens Oldeland <oldeland at gmx.de> writes:

> I am currently preparing a lecture on 'Cluster Analysis' and I found
> two very different ways to interpret a dendrogram. The first option is
> to 'cut' a dendrogram at a specific height, like it is possible with
> the cluster package.
>
> The second option identifies the 'optimal clusters' at different
> heights, for example see McCune etal. 2002 Analysis of Ecological
> Communities Figure 11.3.
>
> Now these are two very different ways of interpreting and I am
> wondering which one is 'allowed' or perhaps the more practical way? Is
> it possible to combine both? I.e. first search for the optimal cut
> level and then adjust each clustering height by aggregation at heigher
> levels?
>

There's really no hard and fast rule to follow with hierarchical
clustering. It's essentially a descriptive technique, and a good
interpretation depends as much on your understanding of the system as
the actual shape of the dendrogram. Even if you take the 'objective'
route and cut the tree at a specific height, you still have to choose
the height. Borcard et al. (2011) provide some nice tools for picking
that height, by the way. They also emphasize that even their favourite
tools don't always produce the most interpretable clusters.

One, possibly reasonable, approach would be to make your first division
based on a specific height on the dendrogram, and then interpret any
"sub-clusters" that are nested within your main groups: "cutting the
tree at the 1.1 level reveals three main groups: coniferous, mixed and
deciduous forests. Within the deciduous forest there are two additional
clusters (distinct at the 0.8 level): maple forests and oak forests".
Once the dendrogram is drawn, it's really up to the ecologist to
determine how best to interpret it.

An important point in all this is that clustering is primarily an
exploratory/descriptive technique, rather than a confirmatory test. From
Borcard et al.: "Clustering is not a typical statistical method in that
it does not test any hypothesis. Clustering helps bring out some
features hidden in the data; it is the user who decides if these
structures are interesting and worth interpreting in ecological terms."

HTH,

Tyler

Borcard, Gillet and Legendre. 2011. Numerical Ecology with R. Springer.



More information about the R-sig-ecology mailing list