[R-sig-Geo] memory limitations to markstat

Roger Bivand Roger.Bivand at nhh.no
Wed Jan 30 21:33:25 CET 2008


On Wed, 30 Jan 2008, Ian Robertson wrote:

> Hello all,
>
> I have been running into memory-related problems trying to use markstat
> (with 'table' as its supplied function) to assemble tabulations of
> categorical marks within fixed distances around about 5000 points.
> Experimenting with random data suggests that my default memory settings
> allow markstat to handle around 3200 points. At 3225 points, I get the
> message "Error: cannot allocate vector of size 317.4 Mb". Does anyone
> know what vector R is trying to store? Can anyone suggest a work around?
> Here is some illustrative code based on random data:
>
> library(spatstat)
> npoints <- 3200 #works
> #npoints <- 3225 #fails
> east <- runif(npoints, 1, 100)
> north <- runif(npoints, 1, 100)
> mark <- ceiling(runif(npoints, 0, 4))
> ppo1 <- ppp(east, north, c(0, 100), c(0, 100), marks=factor(mark))
> mTab <- markstat(ppo1, R=5, table, exclude=T)

Very useful example. If you say:

debug(applynbd)

and run markstat(), you see that it operates with at least four n by n 
matrices, which get stacked in an array. I think your error exit is when 
the big a is being created. It does not use quadtrees or similar data 
structures. Kdot() uses a different internal infrastructure.

If you are willing to try an alternative, I can let you try an unreleased 
ANN tree-based package which has a heuristic distance cutoff (it uses 
k-nearest neighbours, so k has to be adaptive in the inverse of density). 
If your distances are small relative to the total, this should work.
Please say if you prefer a source or Windows binary package. Work is going 
on to bring together several ports of ANN, but isn't ready yet.

If a quicker and dirtier solution is acceptable, try:

library(maptools)
ppo1a <- as(ppo1, "SpatialPointsDataFrame")
summary(ppo1a)
d5nb <- dnearneigh(coordinates(ppo1a), 0, 5)
mt <- sapply(d5nb, function(x) table(ppo1a$marks[x]))
str(mt)

mt will need transposing. Because dnearneigh() doesn't use a full or 
triangular distance matrix, just distances for points one by one, its 
memory footprint is small. d5nb is a list of neighbours within distance 5, 
so can be used with sapply and lapply.

Hope this helps,

Roger

>
> I imagine that the various K-function tools in spatstat have to be
> making tabulations similar to what I have attempted to do with markstat.
> I have experimented with Kdot, forcing it to do a similar amount of work
> by making all the marks the same. Kdot can handle at least 5000 points
> (but not 5500) but since it doesn't return any mark-tabulations, I don't
> think it will help me.
>
> npoints <- 5000 #works
> npoints <- 5500 #fails
> east <- runif(npoints, 1, 100)
> north <- runif(npoints, 1, 100)
> mark <- rep(1, npoints)
> ppo1 <- ppp(east, north, c(0, 100), c(0, 100), marks=factor(mark))
> kd1 <- Kdot(ppo1, "1")
>
> I expect I could divide my study area into several appropriately
> overlapping sections, apply markstat to each, and then reassemble a
> single set of tabulations by selecting the largest 'neighbourhood set'
> available for any cases that get tabulated in more than one section.
> This seems pretty messy, but may be the way to go--short of doing the
> work in GRASS.
>
> Many thanks in advance for any help or advice.
>
> Ian Robertson
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no




More information about the R-sig-Geo mailing list