[R-sig-Geo] How to separate a set of geographical regions to multiple groups that are similar to each other but also maintains geographic contiguity?

Fri Jul 8 09:08:22 CEST 2022

Dear Danlin,

I think you need SKATER (Spatial 'K'luster Analysis by Tree Edge 
Removal) algorithm that is implemented in package 'spdep' (function 
skater()) and does spatially constrained clustering. The results of 
SKATER are contiguous regions formed by more or less similar neighboring 
polygons. It was published by Assuncao et al. (2006). Here you can find 
a tutorial:
https://geodacenter.github.io/tutorials/spatial_cluster/skater.html

HTH,
Ákos
_____
Ákos Bede-Fazekas
Centre for Ecological Research, Hungary

2022.07.08. 6:05 keltezéssel, Danlin Yu írta:
> Dear List members:
>
>
> I have recently attempted to do a regionalization analysis with a group
> of geographic regions, each contains multiple attributes (A1, A2, A3,
> ...). The goal is not like a regular regionalization problem (such as
> K-means) in which you define groups with minimal within group
> dissimilarity but maximal between group dissimilarity.
>
>
> My regionalization is the opposite, I want the groups to be as similar
> as possible (although within group does not have to be as dissimilar as
> possible, but that is of less concern) in terms of means, variance, and
> other statistics. I ran into the minDiff package and its successor
> anticlust package in R, and it is able to do the job wonderfully except
> for one problem: since this is a regionalization problem, I would really
> want the final groups to be geographically connected (spatially
> constrained). Results from minDiff/anticlust, however, show the
> different groups are mixed with one another all over the map. Here is a
> sample code:
>
>
> A dataframe contains the geographic units and attributes is read from a
> shapefile and stored in geo.df.
>
>
> |geo.df<-as.data.frame(read_sf(dsn = getwd(), lay = "geolayer",
> stringsAsFactors = FALSE)) geo.df$class <- anticlustering(geo.df[,
> c("A1", "A2", "A3", "A4", ..., "An"), K = 5, objective = "variance",
> standardize = TRUE) |
>
> I've tried to include coordinates in the list of attributes (A1, A2,
> ..., An), pairwise distances, but none worked. I always ended up with
> well separated groups, but all mixed with one another in the geographic
> space.
>
>
> Any pointers on how to proceed from here? Any hints will be greatly
> appreciated.
>
>
> Thank you all in advance.
>
>
> Best,
>
> Danlin Yu
>