[R-sig-Geo] complete linkage Agglomerative hierarchical clustering, nnclust, spatclus or something else?

Roger Bivand Roger.Bivand at nhh.no
Wed Apr 21 13:42:05 CEST 2010


On Wed, 21 Apr 2010, Hans Ekbrand wrote:

> On Tue, Apr 20, 2010 at 11:13:22PM +0200, Hans Ekbrand wrote:
>> Roger Bivand wrote:
>>> On Tue, 20 Apr 2010, Hans Ekbrand wrote:
>>>
>>>> I have just read about clustering on wikipedia, and learnt that what I
>>>> want is:
>>>>
>>>> Agglomerative hierarchical clustering, with complete linkage
>>>
>>> library(cluster)
>>> ?hclust
>
> print(load(url("http://sociologi.cjb.net/temp/clust.geo.test.RData")))
> clust.geo.test.tree <- hclust(dist(clust.geo.test at coords))
> clust.geo.test.tree$height
>
> head(clust.geo.test.tree$height, 70)
> [1]   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000
> [11]   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000
> [21]   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000
> [31]   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000
> [41]   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000
> [51]   0.000000   0.000000   0.000000   0.000000   3.160631  18.963676  30.398644  32.232351  37.927539  44.987446
> [61]  50.065192  81.542472  82.691738  93.553729  95.971207 105.325405 115.218371 119.540239 125.235381 130.181302
>
> As I understand this, the 54 zeroes represent identical coordinates.
> The positive numbers represent the distance in meters between points
> that have been grouped together at a certain level of the tree. Now, I
> am not interested in grouping together points with distances larger
> than 100 meters, so I would like to stop the clustering process at
> that point - or, after the hclust has completed, extract the clusters
> that were in effect at that level. In the above example that would be
> at level 65.
>
> I didn't understand from the documentation of hclust how to accomplish
> that, can someone on the list help me?

So you do not want hclust at all, really. Look at dnearneigh() in spdep, 
setting a 100m bound. Then use n.comp.nb() to see which points belong to 
which graph component, using perhaps plot.nb with colours to distinguish 
the subgraphs.

Roger

>
> The goal is to count, for each cluster, the number of fires and then
> to analyse how the fires within each cluster is distributed over time,
> and to count how many of them that are too close in time to be
> considered independent.
>

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no



More information about the R-sig-Geo mailing list