[R-sig-Geo] Memory limit problems in R / import of maps

Roger Bivand Roger.Bivand at nhh.no
Wed Apr 23 19:58:25 CEST 2008


On Tue, 22 Apr 2008, Roger Bivand wrote:

> On Tue, 22 Apr 2008, Edzer Pebesma wrote:
>
>> Tomislav Hengl wrote:
>>> Just one last thing,
>> Two?
>>> if R is reporting an error message, that does not necessarily mean that there
>>> is a memory limit problem with the machine
>> Correct, the error message should give a hint,
>>> - shouldn't there be a way to implement memory handling
>>> in R in a more efficient way?
>>>
>> R is open source, so go ahead and modify it.
>>
>> As an advice, first consider the resources you have, and consider the
>> other options just kindly provided to you. PC's with 8 Gb RAM now start
>> at 500 euros, so why process massive data sets on your 2 Gb notebook.
>
> Even on my 2001 1GB desktop (dual xeon, but hey, not exactly high end
> now!), reading the 25 1000x1450 rasters went like a song:
>
> library(rgdal)
> grd <- GridTopology(c(0.5, 0.5), c(1,1), c(1000, 1450))
> set.seed(1)
> for (i in 1:25) {
>   dta <- sample(1:10, prod(slot(grd, "cells.dim")), replace=TRUE)
>   SGDF <- SpatialGridDataFrame(grd, data=data.frame(band1=dta))
>   fn <- paste("kasc", i, ".tif", sep="")
>   writeGDAL(SGDF, fn, drivername="GTiff", type="Byte")
> }
> gc()
> fnames0 <- list.files(pattern="kasc*")
> fnames <- gsub("\\.tif", "", fnames0)
> r1 <- readGDAL(fnames0[1], silent=TRUE)
> Grd <- slot(r1, "grid")
> n <- dim(slot(r1, "data"))[1]
> indata <- matrix(0, nrow=n, ncol=length(fnames0))
> for (i in 1:length(fnames0)) {
>   ingrid <- readGDAL(fnames0[i], silent=TRUE)
>   indata[,i] <- ingrid[[1]]
>   cat(i, "\n")
>   gc()
> }
> gc()
> colnames(indata) <- fnames
> str(indata)
> df <- as.data.frame(indata)
> gc()
> rm(indata)
> str(df)
> gc()
> ingrid <- SpatialGridDataFrame(Grd, data=df)
> gc()
> rm(df)
> gc()
>
> library(adehabitat)
> outkasc <- spixdf2kasc(ingrid)
> ...
>
> Your problem is in spixdf2kasc() in adehabitat, which makes many copies of
> the input object. It may even be possible to inject the
>
>   readGDAL(fnames0[i], silent=TRUE)[[1]]
>
> line into:
>
>  lll <- lapply(1:length(uu), function(i) c(as.matrix(sg[i]))
>                                                      ^^^^^
>
> in spixdf2kasc(), which is arguably not using the best syntax for just
> getting the data out of the columns in its copy sg of ingrid. So
> contributing an optimised version of spixdf2kasc would be helpful - but
> maybe 2GB would work - I was swapping at 1.9GB, but I only have 1GB, so
> maybe you'd get through. It's mostly a matter of watching where copying
> may occur and avoiding it.

With a little tidying, spixdf2kasc() will run on 1GB for this 370MB 
SpatialGridDataFrame, taking just another 370MB by copying the data frame 
just once. If anyone would like a copy, please contact me off-line.

The kasc object is in fact just the SGDF data frame with the rows in 
reversed order, but since enfa() in adehabitat uses a kasc object, you 
probably need to go this way. Probably you'll be using gc() a good deal 
without a little more memory, though.

Getting the output out to an SGDF object ought to be possible too, ask 
about that later if need be.

Roger


>
> The new Braun & Murdoch introduction to statistical programming with R is
> a very useful reference in cases like these - in particular assign large
> objects once and fill them up, and if they are already OK, don't
> over-check them.
>
> In addition, Dylan and Edzer made good points about the potential
> spuriousness of apparent resolution - suitability is in patches, isn't it,
> and the outcome won't be more or less significant with greater n? If you
> aren't using proximity, you could just train on a sample from the 25-layer
> full data set, and predict back from the fitted model, couldn't you? The
> conversion function to kasc does accept SpatialPixelsDataFrame objects,
> but unfortunately promotes them to full grids, so the sample would need to
> be a rectangular subset, I'm afraid. Maybe try the adelist for more help
> on their side, the adehabitat maintainer is helpful when possible.
>
> Hope this helps,
>
> Roger
>
>> --
>> Edzer
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>
>

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no




More information about the R-sig-Geo mailing list