[R-sig-Geo] subs (raster) on large dataset

Lyndon Estes lestes at princeton.edu
Thu Apr 14 16:02:41 CEST 2011


Dear List,

I am using a function to assign the results of a crop model simulation
to a raster, which I then aggregate, and finally resample to the
resolution of another grid. This involves the subs function of raster,
and the raster whose values I am replacing is very large (ncells =
263411880).  Despite this, I have successfully run subs on this same
dataset in the past.  This time, however, it is running for over 9
hours without completing (because I kill R in order get other work
done), and I am on a MacBook Pro with a 2.4 Ghz Core 2 Duo processor
and 4 GB of ram, with R 2.12.2 and raster v 1.8-9. I am thus trying to
figure out if there is something wrong with my coding.  First some
code/results, followed by my specific questions:

Here's the function:

spatCSM <- function(spatialmaster, resamplegrid, res.factor, CSMtable,
LUfield, outfields, outnames, type) {
# Creates spatial output grids from results produced by runCSM. Path
needs to be set in advance
# Args:
#   spatialmaster: A grid defining the location of each unique land
unit (climate-soil combination)
#   resamplegrid: An optional grid defining the resolution to which
results should be resampled
#   resamplefactor: Factor by which to aggregate (e.g. 10 times
current pixel size, the default if not
#     specified)
#   CSMtable: The output table of CSM statistics generated by runCSM
#   LUfield: The field in CSMtable containing the land unit codes that
match values in spatialmaster grid
#   outfields: A vector of column names in CSMtable for which gridded
outputs are wanted, e.g. yield, CV yield
#   outnames: A vector of names for writing output grids, one for each
specifed grid
#   type: Output datatype for grid, e.g. INT2S
# Returns: R format raster grids, saved to disk, of mean yield and
other optional grids

 library(raster)
 if(missing(spatialmaster) | missing(CSMtable) | missing(LUfield) |
    missing(outfields) | missing(type)) {
    stop("Missing parameter, check function list.")
 }
 if(length(outfields) != length(outnames)) {
    stop("Number of names for output grids does not match number of
specified output grids")
 }

 for(i in 1:length(outfields)) {

   #CSMtable[, outfields[i]] <- as.numeric(CSMtable[, outfields[i]])
# Convert to numeric

   out.nm <- paste(outnames[i], ".gri", sep = "")
   subs(spatialmaster, CSMtable, by = LUfield, which = outfields[i],
subsWithNA = TRUE, filename =
                  out.nm, datatype = type, overwrite = TRUE)
   if(!missing(resamplegrid)) {
     cat("Aggregating")
     if(missing(res.factor)) {
       res.factor <- 10
     }
     subs.g <- raster(out.nm)
     out.nm2 <- paste("agg.", out.nm, sep = "")
     aggregate(subs.g, res.factor, fun = mean, na.rm = TRUE,
                             filename = out.nm2, datatype = type,
overwrite = TRUE)
     subs.g.agg <- raster(out.nm2)
     out.nm3 <- paste(outnames[i], ".",
as.integer(res(resamplegrid)[1]), ".gri", sep = "")
     resample(subs.g.agg, resamplegrid, method = "bilinear", filename
= out.nm3, datatype = type,
              overwrite = TRUE)
   }
 }
}

# Function call
spatCSM(spatialmaster = ltq.g, resamplegrid = base.g2, res.factor =
10, CSMtable = sum.tab.c,
       LUfield = "FID", outfields = c("y.mn"), outnames = c("f.yld"),
type = "INT2S")

Inputs

ltq.g
class       : RasterLayer
dimensions  : 15170, 17364, 263411880  (nrow, ncol, ncell)
resolution  : 92.73961, 92.73961  (x, y)
extent      : -733488.1, 876842.5, -3806988, -2400128  (xmin, xmax, ymin, ymax)
projection  : +proj=aea +lat_1=-18 +lat_2=-32 +lat_0=0 +lon_0=24
+x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs
+towgs84=0,0,0
values      : lt.quin.3f.grd
min value   : 1
max value   : 136050

base.g2
class       : RasterLayer
dimensions  : 1651, 1937, 3197987  (nrow, ncol, ncell)
resolution  : 926.6254, 926.6254  (x, y)
extent      : -869378.3, 925495.2, -3887679, -2357820  (xmin, xmax, ymin, ymax)
projection  : +proj=aea +lat_1=-18 +lat_2=-32 +lat_0=0 +lon_0=24
+x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs
+towgs84=0,0,0
values      : in memory
min value   : 1
max value   : 1

sum.tab.c
    FID CR WSTA.... SOIL_ID...     YEARS pre.mn y.mn y.med y.cv y.10
1  16276 MZ AAB_7921 LTAE003301 1980-1998    439 1275  1265   69    0
2  16135 MZ AAB_7921 LTAE003303 1980-1998    422 2071  2295   67  111
3  16820 MZ AAA_7921 LTAE003303 1980-1998    427 2083  2374   66  123
4  16948 MZ AAA_7921 LTAE003304 1980-1998    458 3320  3828   35 1198
5  16149 MZ AAB_7921 LTAE003304 1980-1998    453 3277  3780   37 1086
6  16247 MZ AAB_7921 LTAE003305 1980-1998    423 1420  1645   74    0
7  17414 MZ AAC_7921 LTAE003305 1980-1998    419 1467  1637   68   79
8  17017 MZ AAA_7921 LTAH001401 1980-1998    452 3378  3598   29 1808
9  17016 MZ AAA_7921 LTAH001403 1980-1998    453 3250  3579   32 1507
10 17015 MZ AAA_7921 LTAH001404 1980-1998    425 1607  1918   75    0


Note that I have been testing this function with only a very small
input table providing the values to be substituted (sum.tab.c).  This
is because the crop model itself takes a long while to run for each
location provided in ltq.g.

My first question is whether the small table is causing a problem?  I
figured it wouldn't, since subsWithNA is set to TRUE.  But perhaps it
does.

If that is unlikely, then have I somehow coded this in a way that R is
trying to perform the subs entirely with memory (e.g. because I have
placed subs within a function)?

If neither of these are the case, then I will appreciate any pointers
for how I can make this work more quickly.

I apologize for not providing a fully reproducible example, but the
grid in question is 1 GB, and since the size of it seems to be the
main issue (a version of this grid cut down to the area covered by
sum.tab.c works fine), I don't want to burden anyone with such a large
file.

Thanks in advance for any advice you can provide.

Cheers, Lyndon



-- 
Lyndon Estes
Research Associate
Woodrow Wilson School
Princeton University
+1-609-258-2392 (o)
+1-609-258-6082 (f)
+1-202-431-0496 (m)
lestes at princeton.edu



More information about the R-sig-Geo mailing list