[R-sig-Geo] clustering spatial point data

Georg Ruß research at georgruss.de
Sun Jun 5 19:42:36 CEST 2011


On 02/06/11 12:49:15, Tal Avgar wrote:
> I am looking for a code/function/algorithm for clustering spatial point data
> into two distinct groups, based on spatial coordinates and a measure of a
> continuous response variable at these locations. The requirement is for
> group members to be as similar as possible in their affiliated response
> values but also group members must be clustered in space so that there are
> no events belonging to one group within the space affiliated with the other.
> Any ideas?
> Thanks,
> Tal.

Hi Tal,

I think that the HACC-spatial algorithm which I've developed may turn out
to be of interest to you. I'm using that for management zone delineation 
on precision agriculture data sets which are structurally the same as
yours, i.e. a spatialpoints data frame, I guess.

The idea of HACC-spatial is rather simple: it's a hierarchically
agglomerative constrained clustering procedure, where the constraint (for
me) is spatial contiguity, i.e. the resulting clusters (or zones) should
be mostly contiguous. The trick is to proceed as in standard hierarchical
clustering but to only consider geospatially adjacent points/clusters for
merging. You may even be able to keep this constraint throughout your
algorithm, while I had to switch off the constraint at some point during
the clustering algorithm.

The similarity of points is based on Euclidean distance (or any other
distance measure), the spatial distance is Euclidean, too. The algorithm
starts with generating a distance matrix for the spatial points using
"dist", then it looks for the most similar points (which are allowed to be
merged), merges those points into a cluster and updates the distance
matrix accordingly. The average linkage criterion is used for determining
the similarity of clusters.

You may have a look at the most recent version I've published here:
http://fuzzy.cs.uni-magdeburg.de/aigaion/index.php/publications/show/793
Maybe this helps. It's easy to implement in R. It could probably be
implemented in C or something quicker and then run within R, but I
currently don't have the time to do that. It's likely to be 50% of my PhD
thesis.

Regards,
Georg.



More information about the R-sig-Geo mailing list