[R-sig-Geo] pair correlation function - changes btw 2005-2009?

Mon Aug 10 04:52:50 CEST 2009

"Friderike Oehler" writes:

> I am still struggling with the pcf (spatstat package):
> I try to reproduce a plot that I did in 2005 using exactly the
> same point pattern (a small sample of 18 points, extremely clustered)
> and the same pcf-parameters:
> kernel="epanechnikov", stoyan=0.15, correction="Ripley" (code see below).

> However there is a shift on the r-axis in my present plot against the one
> from 2005: The form of the curve is the same, but while in 2005 I had one
> prominent peak at about r=135, the same peak is now located at approx.
> r=250. Also, in the earlier plot, g(r)=1 was reached at abour r=20, now at
> r=55.

> I checked the FAQs and Latest releases page at
> http://www.spatstat.org/spatstat/, but could not find what could cause this
> different behaviour of the function. Is is anyway possible that a change in
> the code of the function since 2005 causes the observed differences? If not,
> what else could it be?

Yes, the code for 'pcf' has certainly changed since 2005. A lot of related code in 'R' (such as the code in density.default that is used by pcf.ppp) has also changed since 2005.

First, please check that the window for the point pattern (as well as the x and y coordinates) is exactly the same as the one you used in 2005. The choice of window affects the edge corrections and hence the final result.

If that's not the problem, then the most plausible scenario is the following.

To perform the analysis described above, you had to override some of the defaults in pcf.ppp, in particular, the default maximum value of 'r'. The default is there because it is known in the literature that the standard edge corrections can introduce large bias and variance when r is large.

Since the data are extremely clustered, the most plausible explanation is that the software changes between 2005 and 2009 (which were mostly BUG FIXES) , together with the extreme clustering, have caused the discrepancy in output.

The current output from pcf.ppp in spatstat 1.16-1 looks correct to me. If you look at

     hist(pairdist(ab2))

it's clear that there is a mode of pairwise distances at about r=250. This is also reflected in

    plot(Kest(ab2, r=seq(0,400,2.5)),xlim=c(0,400))

which shows a steep jump in the empirical K function at about r=250. So the blip in the pcf at r=250 is real and correct.

Thus, if there was a discrepancy, it seems likely that the current code is correct and the 2005 code contained a bug that cuased the discrepancy.

A summary of the change history of pcf.ppp is listed below. It was introduced in spatstat 1.6-4 (April 2005). The argument 'r' was introduced in spatstat 1.7-13 (October 2005). A completely new, faster algorithm for computing pairwise distances was implemented in spatstat 1.9-2 (June 2006) along with a change to the default rule for the maximum value of 'r'.  The default value of the x limits for plotting was changed in spatstat 1.9-3 (June 2006). The smoothing algorithm was changed to use the new R function density.default in spatstat 1.11-0. The C routine was modified in spatstat 1.13-0. There were cosmetic changes in spatstat 1.10-0, 1.11-1, 1.12-2, 1.15-2 and 1.16-1.

regards

Adrian Baddeley