[R-sig-Geo] Memory limit problems in R / import of maps

Roger Bivand Roger.Bivand at nhh.no
Tue Apr 22 19:29:34 CEST 2008


On Tue, 22 Apr 2008, Edzer Pebesma wrote:

> Tomislav Hengl wrote:
>> Just one last thing,
> Two?
>> if R is reporting an error message, that does not necessarily mean that there
>> is a memory limit problem with the machine
> Correct, the error message should give a hint,
>> - shouldn't there be a way to implement memory handling
>> in R in a more efficient way?
>>
> R is open source, so go ahead and modify it.
>
> As an advice, first consider the resources you have, and consider the
> other options just kindly provided to you. PC's with 8 Gb RAM now start
> at 500 euros, so why process massive data sets on your 2 Gb notebook.

Even on my 2001 1GB desktop (dual xeon, but hey, not exactly high end 
now!), reading the 25 1000x1450 rasters went like a song:

library(rgdal)
grd <- GridTopology(c(0.5, 0.5), c(1,1), c(1000, 1450))
set.seed(1)
for (i in 1:25) {
   dta <- sample(1:10, prod(slot(grd, "cells.dim")), replace=TRUE)
   SGDF <- SpatialGridDataFrame(grd, data=data.frame(band1=dta))
   fn <- paste("kasc", i, ".tif", sep="")
   writeGDAL(SGDF, fn, drivername="GTiff", type="Byte")
}
gc()
fnames0 <- list.files(pattern="kasc*")
fnames <- gsub("\\.tif", "", fnames0)
r1 <- readGDAL(fnames0[1], silent=TRUE)
Grd <- slot(r1, "grid")
n <- dim(slot(r1, "data"))[1]
indata <- matrix(0, nrow=n, ncol=length(fnames0))
for (i in 1:length(fnames0)) {
   ingrid <- readGDAL(fnames0[i], silent=TRUE)
   indata[,i] <- ingrid[[1]]
   cat(i, "\n")
   gc()
}
gc()
colnames(indata) <- fnames
str(indata)
df <- as.data.frame(indata)
gc()
rm(indata)
str(df)
gc()
ingrid <- SpatialGridDataFrame(Grd, data=df)
gc()
rm(df)
gc()

library(adehabitat)
outkasc <- spixdf2kasc(ingrid)
...

Your problem is in spixdf2kasc() in adehabitat, which makes many copies of 
the input object. It may even be possible to inject the

   readGDAL(fnames0[i], silent=TRUE)[[1]]

line into:

  lll <- lapply(1:length(uu), function(i) c(as.matrix(sg[i]))
                                                      ^^^^^

in spixdf2kasc(), which is arguably not using the best syntax for just 
getting the data out of the columns in its copy sg of ingrid. So 
contributing an optimised version of spixdf2kasc would be helpful - but 
maybe 2GB would work - I was swapping at 1.9GB, but I only have 1GB, so 
maybe you'd get through. It's mostly a matter of watching where copying 
may occur and avoiding it.

The new Braun & Murdoch introduction to statistical programming with R is 
a very useful reference in cases like these - in particular assign large 
objects once and fill them up, and if they are already OK, don't 
over-check them.

In addition, Dylan and Edzer made good points about the potential 
spuriousness of apparent resolution - suitability is in patches, isn't it, 
and the outcome won't be more or less significant with greater n? If you 
aren't using proximity, you could just train on a sample from the 25-layer 
full data set, and predict back from the fitted model, couldn't you? The 
conversion function to kasc does accept SpatialPixelsDataFrame objects, 
but unfortunately promotes them to full grids, so the sample would need to 
be a rectangular subset, I'm afraid. Maybe try the adelist for more help 
on their side, the adehabitat maintainer is helpful when possible.

Hope this helps,

Roger

> --
> Edzer
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no




More information about the R-sig-Geo mailing list