# [R] Cluster procedure using geographical neighborhood

Martin Maechler maechler at stat.math.ethz.ch
Fri May 7 12:28:39 CEST 2010

Dear Dario Sacco,

>>>>> "DS" == Dario Sacco <dario.sacco at unito.it>
>>>>>     on Thu, 06 May 2010 17:45:30 +0200 writes:

DS> Dear Dr. Maechler,
DS> I am an agronomist and a researcher at the University of Turin. I am
DS> also teaching "Applied statistics", then I have some knowledge in
DS> Statistics, but not in numerical computation.

DS> I found your email at the Cran website.

DS> At now I am working on segmentation of a GIS database. My problem is
DS> that I have a set of points over a region and I need to define sub
DS> region characterised by small inside variability.
DS> The application seems to apply a hierarchical cluster analysis, but the
DS> agglomeration procedure should consider only pairs of clusters or of
DS> points that are neighbours.

DS> This can be performed deleting the dissimilarities in the dissimilarity
DS> matrix (for example calculated with the dist() procedure in R) that
DS> refers to pairs of points that are not neighbours.

Deleeting is not ok; you should make them "large" in some way.

I think you should just define your  dissimilarities by *both*
*and* the geographical distance, maybe giving much more weight
to the geographical distance, something like

D_{i,j} :=  d_{i,j} +  w*  d~(X_i, X_i)

where d_{i,j} are your dist() or daisy() dissimilarities,
'w' is  weight factor and d~(u,v) is e.g. the geodesic distance
between u and v.

I'm CC'ing this to the R-help mailing list,
as I think you could get more advice from there.

Martin Maechler, ETH Zurich

DS> However if I do that the procedure hclust () does not work anymore.
DS> Moreover, even if it would work, after the first agglomeration any
DS> further agglomeration should take into account only pairs of point or
DS> clusters that are geographically neighbour.
DS> My idea is to create a procedure able to read the list of pairs of point
DS> that are neighbours, and after each agglomeration, indicate to the
DS> procedure which pairs are neighbour, but I am not able to understand the
DS> source code that I dowloaded from the Cran web site.

DS> So, my questions are:
DS> could you help me in solving the problem?
DS> Or, alternatively, could you send to me the agglomeration procedure
DS> applied by R in hcluster() as a programme written in command of R or as
DS> a code for Visual Basic. These two programming language are the only two
DS> that I am able to understand.