Fri Dec 11 15:57:41 CET 2009

```Dear R community,
just in case some haven't noticed my previous email.
I realize "hclust" relies on a Fortran routine, but I hoped
some of you might exactly know how that "Y.hc_c\$height"
is computed. And, thus, explain the anomaly I found.

Thank you.

J

Membrane Protein Laboratory (MPL)
Diamond Light Source Ltd
Diamond House
Harewell Science and Innovation Campus
Chilton, Didcot
Oxfordshire OX11 0DE

-----Original Message-----
From: r-help-bounces at r-project.org on behalf of james.foadi at diamond.ac.uk
Sent: Thu 10/12/2009 13:26
To: r-help at r-project.org

Dear R community,
I would be greatful if somebody could shed light on the following.

I have created a set of 6 points to check how centroid
agglomeration works in cluster analysis:

> Y <- data.frame(x=c(-1,1,1,-1,10,12),y=c(1,1,-1,-1,0,0))

It is quite intuitive to understand that the last clusters to be joined will be
{1,2,3,4} with {5,6}. Now, the centroid for the first cluster has coordinates (0,0),
while the centroid for the second cluster has coordinates (11,0). Therefore, the
distance between these two cluster should be 11. But:

> Y.dist <- dist(Y)
> Y.hc_c <- hclust(Y.dist,method="centroid")
> Y.hc_c\$merge
[,1] [,2]
[1,]   -1   -2
[2,]   -3    1
[3,]   -4    2
[4,]   -5   -6
[5,]    3    4
> Y.hc_c\$height
[1] 2.000000 1.914214 1.517428 2.000000 9.692575

So, from this it would appear that the distance between the last two clusters is 9.692575!
How can it be?

J

Membrane Protein Laboratory (MPL)
Diamond Light Source Ltd
Diamond House
Harewell Science and Innovation Campus
Chilton, Didcot
Oxfordshire OX11 0DE