agnes {cluster} | R Documentation |

Computes agglomerative hierarchical clustering of the dataset.

agnes(x, diss = inherits(x, "dist"), metric = "euclidean", stand = FALSE, method = "average", par.method, keep.diss = n < 100, keep.data = !diss)

`x` |
data matrix or data frame, or dissimilarity matrix, depending on the
value of the In case of a matrix or data frame, each row corresponds to an observation, and each column corresponds to a variable. All variables must be numeric. Missing values (NAs) are allowed. In case of a dissimilarity matrix, |

`diss` |
logical flag: if TRUE (default for |

`metric` |
character string specifying the metric to be used for calculating
dissimilarities between observations.
The currently available options are "euclidean" and "manhattan".
Euclidean distances are root sum-of-squares of differences, and
manhattan distances are the sum of absolute differences.
If |

`stand` |
logical flag: if TRUE, then the measurements in |

`method` |
character string defining the clustering method. The six methods
implemented are "average" ([unweighted pair-]group average method, UPGMA),
"single" (single linkage), "complete" (complete linkage),
"ward" (Ward's method), "weighted" (weighted average linkage) and
its generalization |

`par.method` |
if |

`keep.diss, keep.data` |
logicals indicating if the dissimilarities
and/or input data |

`agnes`

is fully described in chapter 5 of Kaufman and Rousseeuw (1990).
Compared to other agglomerative clustering methods such as `hclust`

,
`agnes`

has the following features: (a) it yields the
agglomerative coefficient (see `agnes.object`

)
which measures the amount of clustering structure found; and (b)
apart from the usual tree it also provides the banner, a novel
graphical display (see `plot.agnes`

).

The `agnes`

-algorithm constructs a hierarchy of clusterings.

At first, each observation is a small cluster by itself. Clusters are
merged until only one large cluster remains which contains all the
observations. At each stage the two *nearest* clusters are combined
to form one larger cluster.

For `method="average"`

, the distance between two clusters is the
average of the dissimilarities between the points in one cluster and the
points in the other cluster.

In `method="single"`

, we use the smallest dissimilarity between a
point in the first cluster and a point in the second cluster (nearest
neighbor method).

When `method="complete"`

, we use the largest dissimilarity
between a point in the first cluster and a point in the second cluster
(furthest neighbor method).

The `method = "flexible"`

allows (and requires) more details:
The Lance-Williams formula specifies how dissimilarities are
computed when clusters are agglomerated (equation (32) in K.\&R.,
p.237). If clusters *C_1* and *C_2* are agglomerated into a
new cluster, the dissimilarity between their union and another
cluster *Q* is given by

*
D(C_1 \cup C_2, Q) = α_1 * D(C_1, Q) + α_2 * D(C_2, Q) +
β * D(C_1,C_2) + γ * |D(C_1, Q) - D(C_2, Q)|,
*

where the four coefficients *(α_1, α_2, β, γ)*
are specified by the vector `par.method`

:

If `par.method`

is of length 1,
say *= α*, `par.method`

is extended to
give the “Flexible Strategy” (K. \& R., p.236 f) with
Lance-Williams coefficients *(α_1 = α_2 = α, β =
1 - 2α, γ=0)*.

If of length 3, *γ = 0* is used.

**Care** and expertise is probably needed when using ```
method
= "flexible"
```

particularly for the case when `par.method`

is
specified of longer length than one.
The *weighted average* (`method="weighted"`

) is the same as
`method="flexible", par.method = 0.5`

.

an object of class `"agnes"`

(which extends `"twins"`

)
representing the clustering. See `agnes.object`

for
details, and methods applicable.

Cluster analysis divides a dataset into groups (clusters) of observations that are similar to each other.

- Hierarchical methods
like

`agnes`

,`diana`

, and`mona`

construct a hierarchy of clusterings, with the number of clusters ranging from one to the number of observations.- Partitioning methods
like

`pam`

,`clara`

, and`fanny`

require that the number of clusters be given by the user.

Kaufman, L. and Rousseeuw, P.J. (1990).
*Finding Groups in Data: An Introduction to Cluster Analysis*.
Wiley, New York.

Anja Struyf, Mia Hubert & Peter J. Rousseeuw (1996)
Clustering in an Object-Oriented Environment.
*Journal of Statistical Software* **1**.
http://www.jstatsoft.org/v01/i04

Struyf, A., Hubert, M. and Rousseeuw, P.J. (1997). Integrating
Robust Clustering Techniques in S-PLUS,
*Computational Statistics and Data Analysis*, **26**, 17–37.

Lance, G.N., and W.T. Williams (1966).
A General Theory of Classifactory Sorting Strategies, I. Hierarchical
Systems.
*Computer J.* **9**, 373–380.

`agnes.object`

, `daisy`

, `diana`

,
`dist`

, `hclust`

, `plot.agnes`

,
`twins.object`

.

data(votes.repub) agn1 <- agnes(votes.repub, metric = "manhattan", stand = TRUE) agn1 plot(agn1) op <- par(mfrow=c(2,2)) agn2 <- agnes(daisy(votes.repub), diss = TRUE, method = "complete") plot(agn2) agnS <- agnes(votes.repub, method = "flexible", par.meth = 0.6) plot(agnS) par(op) ## Exploring the dendrogram structure (d2 <- as.dendrogram(agn2)) # two main branches d2[[1]] # the first branch d2[[2]] # the 2nd one { 8 + 42 = 50 } d2[[1]][[1]]# first sub-branch of branch 1 .. and shorter form identical(d2[[c(1,1)]], d2[[1]][[1]]) ## a "textual picture" of the dendrogram : str(d2) data(agriculture) ## Plot similar to Figure 7 in ref ## Not run: plot(agnes(agriculture), ask = TRUE)

[Package *cluster* version 1.14.3 Index]