[R] Cluster procedure using geographical neighborhood
Martin Maechler
maechler at stat.math.ethz.ch
Fri May 7 12:28:39 CEST 2010
Dear Dario Sacco,
>>>>> "DS" == Dario Sacco <dario.sacco at unito.it>
>>>>> on Thu, 06 May 2010 17:45:30 +0200 writes:
DS> Dear Dr. Maechler,
DS> I am an agronomist and a researcher at the University of Turin. I am
DS> also teaching "Applied statistics", then I have some knowledge in
DS> Statistics, but not in numerical computation.
DS> I found your email at the Cran website.
DS> At now I am working on segmentation of a GIS database. My problem is
DS> that I have a set of points over a region and I need to define sub
DS> region characterised by small inside variability.
DS> The application seems to apply a hierarchical cluster analysis, but the
DS> agglomeration procedure should consider only pairs of clusters or of
DS> points that are neighbours.
DS> This can be performed deleting the dissimilarities in the dissimilarity
DS> matrix (for example calculated with the dist() procedure in R) that
DS> refers to pairs of points that are not neighbours.
Deleeting is not ok; you should make them "large" in some way.
I think you should just define your dissimilarities by *both*
the "variability" (your current dist())
*and* the geographical distance, maybe giving much more weight
to the geographical distance, something like
D_{i,j} := d_{i,j} + w* d~(X_i, X_i)
where d_{i,j} are your dist() or daisy() dissimilarities,
'w' is weight factor and d~(u,v) is e.g. the geodesic distance
between u and v.
I'm CC'ing this to the R-help mailing list,
as I think you could get more advice from there.
Martin Maechler, ETH Zurich
DS> However if I do that the procedure hclust () does not work anymore.
DS> Moreover, even if it would work, after the first agglomeration any
DS> further agglomeration should take into account only pairs of point or
DS> clusters that are geographically neighbour.
DS> My idea is to create a procedure able to read the list of pairs of point
DS> that are neighbours, and after each agglomeration, indicate to the
DS> procedure which pairs are neighbour, but I am not able to understand the
DS> source code that I dowloaded from the Cran web site.
DS> So, my questions are:
DS> could you help me in solving the problem?
DS> Or, alternatively, could you send to me the agglomeration procedure
DS> applied by R in hcluster() as a programme written in command of R or as
DS> a code for Visual Basic. These two programming language are the only two
DS> that I am able to understand.
DS> Thank you in advance for any suggestion or help you will give me.
DS> Best regards,
DS> Dario Sacco
DS> --
DS> Dr. Dario Sacco
DS> Dept. of Agronomy, Forestry and Land Management
DS> University of Turin
More information about the R-help
mailing list