[R-sig-Geo] Looking for a clustering indicator

Wed Feb 17 01:46:01 CET 2010

Etienne Bellemare Racine etiennebr at gmail.com<mailto:etiennebr at gmail.com> writes:

> I am looking for a way to tell how much clustered or not a process is
> (on a numeric scale). I've tried to test for CSR, but the result is too
> narrow as it only tell if the process is random and uniform inside a
> confidence interval. I would like to have an indicator going e.g. from
> random, to clustered, to very clustered. Do you know any way I could do
> that on a more than 3000 points pattern ?

> Maybe I've overlooked a simple CSR test, which could be interpreted to
> give a non-boolean answer ?
If I'm not mistaken, you are asking about spatial point pattern data. (So Moran's I is not applicable).

A hypothesis test is designed to give a yes/no answer. To get a measure of clustering, you need a summary statistic of some kind.

You could use the K-function or one of the other classical summary statistics (G-function, F-function etc). The values of these functions are indicators of the degree of clustering or regularity in the point pattern. Choose a particular distance r. Then K(r) suggests clustering if K(r) > pi * r^2 and suggests regularity if K(r) < pi * r^2. The value of K(r) is a measure of the degree of clustering or regularity. Similarly for the other summary functions.

I assume that you calculated envelopes of the K-function (for example) based on simulation from CSR, and plotted these together with the estimated K-function from the data point pattern. This is equivalent to a hypothesis test (it is NOT equivalent to a confidence interval). The test statistic is the rank of the observed value of K(r) amongst the simulated values of K(r). You could use this rank as a measure of clustering or repulsion.

However the most precise way to get an estimate of the degree of clustering is to fit a model to the data, and use the  value of an appropriate parameter in the model. For example, computing K(r) is equivalent to fitting a Strauss point process model with interaction range r. The interaction parameter 'gamma' of the Strauss process is a measure of the degree of regularity. There are many other models you could use. The Geyer saturation model allows both clustering and regularity. The interaction parameter 'gamma' of the Geyer model ranges from 0 to infinity with gamma < 1 indicating regularity and gamma > 1 indicating clustering.

In the package 'spatstat' you can fit the Strauss process model with r=0.2 to a point pattern dataset X by typing
      fit <- ppm(X, ~1, Strauss(0.2))
Then printing 'fit' gives the interaction parameter gamma.

For more information please read the spatstat workshop notes www.csiro.au/resources/pf16h.html<http://www.csiro.au/resources/pf16h.html>

Adrian Baddeley