[R] Cluster procedure using geographical neighborhood

Fri May 7 12:28:39 CEST 2010

Dear Dario Sacco,

>>>>> "DS" == Dario Sacco <dario.sacco at unito.it>
>>>>>     on Thu, 06 May 2010 17:45:30 +0200 writes:

    DS> Dear Dr. Maechler,
    DS> I am an agronomist and a researcher at the University of Turin. I am 
    DS> also teaching "Applied statistics", then I have some knowledge in 
    DS> Statistics, but not in numerical computation.

    DS> I found your email at the Cran website.

    DS> At now I am working on segmentation of a GIS database. My problem is 
    DS> that I have a set of points over a region and I need to define sub 
    DS> region characterised by small inside variability.
    DS> The application seems to apply a hierarchical cluster analysis, but the 
    DS> agglomeration procedure should consider only pairs of clusters or of 
    DS> points that are neighbours.

    DS> This can be performed deleting the dissimilarities in the dissimilarity 
    DS> matrix (for example calculated with the dist() procedure in R) that 
    DS> refers to pairs of points that are not neighbours.

Deleeting is not ok; you should make them "large" in some way.

I think you should just define your  dissimilarities by *both*
the "variability" (your current dist())
*and* the geographical distance, maybe giving much more weight
to the geographical distance, something like

   D_{i,j} :=  d_{i,j} +  w*  d~(X_i, X_i)

where d_{i,j} are your dist() or daisy() dissimilarities,
'w' is  weight factor and d~(u,v) is e.g. the geodesic distance
between u and v.

I'm CC'ing this to the R-help mailing list,
as I think you could get more advice from there.

Martin Maechler, ETH Zurich

    DS> However if I do that the procedure hclust () does not work anymore. 
    DS> Moreover, even if it would work, after the first agglomeration any 
    DS> further agglomeration should take into account only pairs of point or 
    DS> clusters that are geographically neighbour.
    DS> My idea is to create a procedure able to read the list of pairs of point 
    DS> that are neighbours, and after each agglomeration, indicate to the 
    DS> procedure which pairs are neighbour, but I am not able to understand the 
    DS> source code that I dowloaded from the Cran web site.

    DS> So, my questions are:
    DS> could you help me in solving the problem?
    DS> Or, alternatively, could you send to me the agglomeration procedure 
    DS> applied by R in hcluster() as a programme written in command of R or as 
    DS> a code for Visual Basic. These two programming language are the only two 
    DS> that I am able to understand.

    DS> Thank you in advance for any suggestion or help you will give me.
    DS> Best regards,

    DS> Dario Sacco

    DS> -- 
    DS> Dr. Dario Sacco
    DS> Dept. of Agronomy, Forestry and Land Management
    DS> University of Turin