[R-sig-Geo] spatial clustering taking account of "value"

Sean O'Riordain seanpor at acm.org
Wed Mar 10 06:51:53 CET 2010


Good morning,

I'm afraid I don't even know *exactly* what I'm looking for - apart
from some guidance please!

I have about 1.5 million (x,y,value) triples - for the most part these
are independent from each other - building location and sum insured.
I'm sure there are *lots* of clusters but I've no idea how many, and
I'm really only interested in looking at the clusters of highest
value.

I've already programmed a simple tagging of total value within 500
metres of every location - though not every building is accurately
tagged - some are only geocoded to UK postcode - so all buildings in a
postcode have the same coordinates.

I'm looking to highlight "clusters" (definition unclear!) where there
are a number of points "close together" (definition unclear!) and the
sum of all the values in the "cluster" is "high".  I'm happy to ignore
all "low" valued clusters or points which are of low value and all on
their own.  There could be a maximum threshold distance (say 5km) or
space between points beyond which it is definitely not part of a
cluster.  The algorithm doesn't have to perfectly identify all
clusters - I'm quite happy to start by looking that a small (say the
top 10) set of highest valued "clusters".

I've looked at a variety of sources on the web - but it is my
understanding that 1 million+ points is considered *very* big for most
clustering algorithms.  I've only come across clustering by distance
rather than sum of value and distance - I'm probably missing something
or mis-interpreting what I'm seeing!  I think I'm looking for a
modified form of density clustering...  Clearly I can't create a
full-size distance matrix and perfection isn't expected ! :-)  A
modified DBSCAN looks like it might be what I'm looking for?

Clearly an alternative to clustering is some sort of density algorithm
that allows for value - but I can't quite get my head around how this
might work.

Could someone point me in the right direction - what other keywords
should I be looking out for?  what R packages are worth a look?

Thanks in advance,
Sean O'Riordain
Dublin,
Ireland



More information about the R-sig-Geo mailing list