[R-sig-Geo] memory limitations to markstat
Roger Bivand
Roger.Bivand at nhh.no
Wed Jan 30 21:33:25 CET 2008
On Wed, 30 Jan 2008, Ian Robertson wrote:
> Hello all,
>
> I have been running into memory-related problems trying to use markstat
> (with 'table' as its supplied function) to assemble tabulations of
> categorical marks within fixed distances around about 5000 points.
> Experimenting with random data suggests that my default memory settings
> allow markstat to handle around 3200 points. At 3225 points, I get the
> message "Error: cannot allocate vector of size 317.4 Mb". Does anyone
> know what vector R is trying to store? Can anyone suggest a work around?
> Here is some illustrative code based on random data:
>
> library(spatstat)
> npoints <- 3200 #works
> #npoints <- 3225 #fails
> east <- runif(npoints, 1, 100)
> north <- runif(npoints, 1, 100)
> mark <- ceiling(runif(npoints, 0, 4))
> ppo1 <- ppp(east, north, c(0, 100), c(0, 100), marks=factor(mark))
> mTab <- markstat(ppo1, R=5, table, exclude=T)
Very useful example. If you say:
debug(applynbd)
and run markstat(), you see that it operates with at least four n by n
matrices, which get stacked in an array. I think your error exit is when
the big a is being created. It does not use quadtrees or similar data
structures. Kdot() uses a different internal infrastructure.
If you are willing to try an alternative, I can let you try an unreleased
ANN tree-based package which has a heuristic distance cutoff (it uses
k-nearest neighbours, so k has to be adaptive in the inverse of density).
If your distances are small relative to the total, this should work.
Please say if you prefer a source or Windows binary package. Work is going
on to bring together several ports of ANN, but isn't ready yet.
If a quicker and dirtier solution is acceptable, try:
library(maptools)
ppo1a <- as(ppo1, "SpatialPointsDataFrame")
summary(ppo1a)
d5nb <- dnearneigh(coordinates(ppo1a), 0, 5)
mt <- sapply(d5nb, function(x) table(ppo1a$marks[x]))
str(mt)
mt will need transposing. Because dnearneigh() doesn't use a full or
triangular distance matrix, just distances for points one by one, its
memory footprint is small. d5nb is a list of neighbours within distance 5,
so can be used with sapply and lapply.
Hope this helps,
Roger
>
> I imagine that the various K-function tools in spatstat have to be
> making tabulations similar to what I have attempted to do with markstat.
> I have experimented with Kdot, forcing it to do a similar amount of work
> by making all the marks the same. Kdot can handle at least 5000 points
> (but not 5500) but since it doesn't return any mark-tabulations, I don't
> think it will help me.
>
> npoints <- 5000 #works
> npoints <- 5500 #fails
> east <- runif(npoints, 1, 100)
> north <- runif(npoints, 1, 100)
> mark <- rep(1, npoints)
> ppo1 <- ppp(east, north, c(0, 100), c(0, 100), marks=factor(mark))
> kd1 <- Kdot(ppo1, "1")
>
> I expect I could divide my study area into several appropriately
> overlapping sections, apply markstat to each, and then reassemble a
> single set of tabulations by selecting the largest 'neighbourhood set'
> available for any cases that get tabulated in more than one section.
> This seems pretty messy, but may be the way to go--short of doing the
> work in GRASS.
>
> Many thanks in advance for any help or advice.
>
> Ian Robertson
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no
More information about the R-sig-Geo
mailing list