[R-sig-Geo] memory limitations to markstat

Fri Feb 1 04:45:27 CET 2008

Ian Robertson <igr at stanford.edu> writes:

> I have been running into memory-related problems trying to use markstat
> (with 'table' as its supplied function) to assemble tabulations of
> categorical marks within fixed distances around about 5000 points.
> Experimenting with random data suggests that my default memory settings
> allow markstat to handle around 3200 points. At 3225 points, I get the
> message "Error: cannot allocate vector of size 317.4 Mb". Does anyone
> know what vector R is trying to store? Can anyone suggest a work around?

>I imagine that the various K-function tools in spatstat have to be
> making tabulations similar to what I have attempted to do with markstat.
> I have experimented with Kdot, forcing it to do a similar amount of work
> by making all the marks the same. Kdot can handle at least 5000 points
> (but not 5500) but since it doesn't return any mark-tabulations, I don't
> think it will help me.

We're talking about the package 'spatstat'.

Although `markstat' and the K-function tools (Kest, Kdot, Kcross etc)
perform somewhat similar calculations, they are based on different
objectives. The K-function tools are designed to calculate a specific
summary of the r-neighbourhood (the points within distance r of the
current point) for ALL values of r. On the other hand 'markstat' is
designed to calculate ANY summary of the r-neighbourhood for a FIXED value
of r.
Also the K-functions are designed for speed while `markstat' is
designed for complete flexibility (the desired summary operation to be
applied to each r-neighbourhood can be specified by an R function). The
K-functions use C code routines to identify the r-neighbourhoods and
fairly efficient code (Kest has been tested on patterns of 1 million
points). `markstat' creates a huge n x n matrix of all pairwise distances,
and uses `apply' to compute the desired summary values.

> Here's some illustrative code:
> library(spatstat)
> npoints <- 3200 #works
> #npoints <- 3225 #fails
> east <- runif(npoints, 1, 100)
> north <- runif(npoints, 1, 100)
> mark <- ceiling(runif(npoints, 0, 4))
> ppo1 <- ppp(east, north, c(0, 100), c(0, 100), marks=factor(mark))
> mTab <- markstat(ppo1, R=5, table, exclude=T)

If I understand correctly, you want to generate a large table in which the
columns represent the points in the data pattern, the rows represent the
possible mark values, and the entries are frequencies. Thus a column with
entries 0, 2, 1, 0 means that there were 2 points with mark = 2 and 1
point with mark=3 in the r-neighbourhood of the point in question.

The nearest existing equivalent in spatstat is Kcross (at least this is
the function that has to compute how many points of type j there are
within an r-neighbourhood of each point).

I will implement a function `marktable' that does what you want, and add
it to the next version of spatstat (1.12-6) that should be released this
weekend.

regards
Adrian Baddeley