[R-sig-Geo] [SOLVED: complete linkage Agglomerative hierarchical clustering]

Hans Ekbrand hans at sociologi.cjb.net
Wed Apr 21 13:38:57 CEST 2010


On Wed, Apr 21, 2010 at 09:59:51AM +0200, Hans Ekbrand wrote:

[...]

> head(clust.geo.test.tree$height, 70)
>  [1]   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000
> [11]   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000
> [21]   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000
> [31]   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000
> [41]   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000
> [51]   0.000000   0.000000   0.000000   0.000000   3.160631  18.963676  30.398644  32.232351  37.927539  44.987446
> [61]  50.065192  81.542472  82.691738  93.553729  95.971207 105.325405 115.218371 119.540239 125.235381 130.181302
> 
> As I understand this, the 54 zeroes represent identical coordinates.
> The positive numbers represent the distance in meters between points
> that have been grouped together at a certain level of the tree. Now, I
> am not interested in grouping together points with distances larger
> than 100 meters, so I would like to stop the clustering process at
> that point - or, after the hclust has completed, extract the clusters
> that were in effect at that level. In the above example that would be
> at level 65.

I found cutree(), and understood the "h" parameter of cutree, and then
it all worked out. Here's an example for the archives.

# Clustering
max.distance.in.same.cluster <- 100
print(load(url("http://sociologi.cjb.net/temp/clust.geo.test.RData")))
clust.geo.test.tree <- hclust(dist(clust.geo.test at coords))
my.cluster <- cutree(clust.geo.test.tree, h = max.distance.in.same.cluster)
# Which clusters have more than one member?
sort(unique(my.cluster[which(duplicated(my.cluster))]))

# How many members do these cluster have?
sapply(sort(unique(my.cluster[which(duplicated(my.cluster))])), function(x) {length(which(my.cluster == x))})

# Print a sorted list of the longest distances within each of these clusters.
sort(sapply(sort(unique(my.cluster[which(duplicated(my.cluster))])), function(x) {max(dist(clust.geo.test at coords[which(my.cluster == x),]))}))


Thanks again, Roger, for the pointer to hclust() 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20100421/91e87d8a/attachment.bin>


More information about the R-sig-Geo mailing list