[R] How to retrieve pairwise distances between clusters after cutting the tree?
David Carlson
dcarlson at tamu.edu
Tue Aug 6 22:54:04 CEST 2013
Assuming you are defining "distance between clusters" as the
distance between the centroids and you have the original data, you
can use aggregate() on the original data with the output from
cutree() as the grouping variable to create a new data.frame of
cluster centers (means). Then just run that through dist().
Something like
set.seed(42)
x <- matrix(runif(250), 25, 10)
dist(aggregate(x, by=list(cutree(hclust(dist(x)), k=3)), mean))
# 1 2
# 2 1.297682
# 3 2.150580 1.380707
-------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77840-4352
-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of Naxerova, Kamila
Sent: Tuesday, August 6, 2013 1:00 PM
To: r-help at r-project.org
Subject: [R] How to retrieve pairwise distances between clusters
after cutting the tree?
Dear all,
what would be the best way of retrieving distances between
individual clusters after cutting my tree of interest? $height from
the hclust object will give me the distance between clusters at a
each agglomeration step, but let's say I have a situation where I
have six observations A, B, C, D, E, F. The clustering proceeds
1) {A,B}
2) {C,D},
3) {E,F},
3) {C,D,E,F}
4) {A,B,C,D,E,F}
but now I want to know the distance between {A,B} and {E,F} which is
not directly recorded in $height?
I could find the distance by locating cluster members in the
original distance matrix, but is there a more direct way that I
might not be aware of? Something along the lines of
calc.pairwise.dist(cutree(hclust(dist),k=3))?
Many thanks in advance.
Kamila
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list