[R-sig-Geo] Memory limit problems in R / import of maps
Roger Bivand
Roger.Bivand at nhh.no
Tue Apr 22 19:29:34 CEST 2008
On Tue, 22 Apr 2008, Edzer Pebesma wrote:
> Tomislav Hengl wrote:
>> Just one last thing,
> Two?
>> if R is reporting an error message, that does not necessarily mean that there
>> is a memory limit problem with the machine
> Correct, the error message should give a hint,
>> - shouldn't there be a way to implement memory handling
>> in R in a more efficient way?
>>
> R is open source, so go ahead and modify it.
>
> As an advice, first consider the resources you have, and consider the
> other options just kindly provided to you. PC's with 8 Gb RAM now start
> at 500 euros, so why process massive data sets on your 2 Gb notebook.
Even on my 2001 1GB desktop (dual xeon, but hey, not exactly high end
now!), reading the 25 1000x1450 rasters went like a song:
library(rgdal)
grd <- GridTopology(c(0.5, 0.5), c(1,1), c(1000, 1450))
set.seed(1)
for (i in 1:25) {
dta <- sample(1:10, prod(slot(grd, "cells.dim")), replace=TRUE)
SGDF <- SpatialGridDataFrame(grd, data=data.frame(band1=dta))
fn <- paste("kasc", i, ".tif", sep="")
writeGDAL(SGDF, fn, drivername="GTiff", type="Byte")
}
gc()
fnames0 <- list.files(pattern="kasc*")
fnames <- gsub("\\.tif", "", fnames0)
r1 <- readGDAL(fnames0[1], silent=TRUE)
Grd <- slot(r1, "grid")
n <- dim(slot(r1, "data"))[1]
indata <- matrix(0, nrow=n, ncol=length(fnames0))
for (i in 1:length(fnames0)) {
ingrid <- readGDAL(fnames0[i], silent=TRUE)
indata[,i] <- ingrid[[1]]
cat(i, "\n")
gc()
}
gc()
colnames(indata) <- fnames
str(indata)
df <- as.data.frame(indata)
gc()
rm(indata)
str(df)
gc()
ingrid <- SpatialGridDataFrame(Grd, data=df)
gc()
rm(df)
gc()
library(adehabitat)
outkasc <- spixdf2kasc(ingrid)
...
Your problem is in spixdf2kasc() in adehabitat, which makes many copies of
the input object. It may even be possible to inject the
readGDAL(fnames0[i], silent=TRUE)[[1]]
line into:
lll <- lapply(1:length(uu), function(i) c(as.matrix(sg[i]))
^^^^^
in spixdf2kasc(), which is arguably not using the best syntax for just
getting the data out of the columns in its copy sg of ingrid. So
contributing an optimised version of spixdf2kasc would be helpful - but
maybe 2GB would work - I was swapping at 1.9GB, but I only have 1GB, so
maybe you'd get through. It's mostly a matter of watching where copying
may occur and avoiding it.
The new Braun & Murdoch introduction to statistical programming with R is
a very useful reference in cases like these - in particular assign large
objects once and fill them up, and if they are already OK, don't
over-check them.
In addition, Dylan and Edzer made good points about the potential
spuriousness of apparent resolution - suitability is in patches, isn't it,
and the outcome won't be more or less significant with greater n? If you
aren't using proximity, you could just train on a sample from the 25-layer
full data set, and predict back from the fitted model, couldn't you? The
conversion function to kasc does accept SpatialPixelsDataFrame objects,
but unfortunately promotes them to full grids, so the sample would need to
be a rectangular subset, I'm afraid. Maybe try the adelist for more help
on their side, the adehabitat maintainer is helpful when possible.
Hope this helps,
Roger
> --
> Edzer
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no
More information about the R-sig-Geo
mailing list