[R-sig-Geo] Replicating spatstat::pcfcross

Tue Jan 28 05:39:21 CET 2014

On 28/01/14 10:39, Noam Ross wrote:

> I have an application for which I need very fast calculation of the pair
> correlation function between two point types in a marked point pattern.
> I'm attempting to replicate results from spatstat::pcfcross. I wrote a
> quick-and-dirty version of this function to ensure that I understood it

Oh, I *so* wanted to channel Brian Ripley there for a moment %^]

In the line 

 > cp$wt = A / (W - abs(cp$dx))*(H - abs(cp$dy))

you need to put parentheses around the denominator,
because a/b*c  is parsed as (a/b)*c. Because of this line, the weights are incorrect;
they are sometimes less than 1 (instead of being always > 1) which is 
causing most of the negative bias.

The sharp drop in the estimate at r ~~ rmax is caused by the line 
     cp <- closepairs(pp, rmax)
which restricts the calculation to pairs of points with distance <= rmax.
Contributions to the kernel estimate of pcf(rmax) can occur 
from interpoint distances slightly greater than rmax. In the call to closepairs()
you need to replace 'rmax' by 'rmax + kwide' where 'kwide' is the width 
of the support of the kernel.

There are several other idiosyncrasies (things that will only
work if the window is a rectangle, there are only two possible marks, etc)
but no other major problems.

Thank you for communicating this question. It has spurred me to get on with
implementing a faster version of pcfcross in 'spatstat'.

Adrian Baddeley
----

On 28/01/14 10:39, Noam Ross wrote:

> I have an application for which I need very fast calculation of the pair
> correlation function between two point types in a marked point pattern.
>
> I'm attempting to replicate results from spatstat::pcfcross.  I wrote a
> quick-and-dirty version of this function to ensure that I understood it
> properly before moving everything to C and optimizing for my particular
> application. However, I'm not quite getting the same results as with
> pcfcross.
>
> I note that, while pcfcross calculates the pcf for each mark type, then
> calculates the cross-pcf from those, I am calculating the cross-pcf
> directly from the distances between heterogeneous pairs. This seems like a
> more efficient approach when I am only interested in the latter. The
> resulting estimated pcf consistently lower than that from pcfcross at long
> distances, and sometimes very different from pcfcross near r=0.  What could
> account for this difference? Is it expected given the different methods?
>
> I've included the function below and an example plotting the spatstat and
> my own version against each other.
>
> #set.seed(0)
> library(spatstat)
> pp = rmpoispp(lambda=c(100,100), win=c(0,1,0,1), types=c("S", "I"))
>
> pcfcross2 <- function(pp, rmax = 0.25, stoyan = 0.15) {
>
>    #Generate distances between close pairs, limit only to heterogenous pairs
>    cp = closepairs(pp, rmax)
>    dups = duplicated(cp$d)
>    cp = lapply(cp, function(z) z[!dups])
>    Infected = as.integer(marks(pp) == "I")
>    SIpairs = which(Infected[cp$i] + Infected[cp$j] == 1)
>    cp = lapply(cp, function(z) z[SIpairs])
>
>    #ppp statistics
>    win = as.owin(pp)
>    H = diff(win$yrange)
>    W = diff(win$xrange)
>    A = H*W
>    lambda = sqrt(prod(intensity(pp))/A)
>
>    #calculate translational edge weights
>    cp$wt = A / (W - abs(cp$dx))*(H - abs(cp$dy))
>    wtot = sum(cp$wt)
>
>    #Density parameters and calculation
>    h <- stoyan/sqrt(pp$n/A)
>    bw <- h/sqrt(5)
>    dens = density.default(cp$d, weights=cp$wt/wtot, bw = bw,
>                           kernel="epanechnikov", from=0, to=rmax,
>                           n = 513)
>
>    #Scale to g(r)
>    r = dens$x
>    y = dens$y*wtot / (2*pi*r*lambda^2*A)
>    y[which(y==Inf)] = NA
>    return(data.frame(r=r, y=y))
> }
>
>
> pcf1 = pcfcross(pp, "S", "I", correction="translate")
> plot(pcf1)
> pcf2 = pcfcross2(pp)
> lines(pcf2, col="blue")