[R-sig-Geo] subs (raster) on large dataset

Lyndon Estes lestes at princeton.edu
Tue Apr 19 21:19:31 CEST 2011


Dear Robert,

Thanks for your response on this (sorry for my slow feedback, I had
another project intervene).

I have gone through the subs code now:

Using my raster ltq.g:

class       : RasterLayer
dimensions  : 15170, 17364, 263411880  (nrow, ncol, ncell)
resolution  : 92.73961, 92.73961  (x, y)
extent      : -733488.1, 876842.5, -3806988, -2400128  (xmin, xmax, ymin, ymax)
projection  : +proj=aea +lat_1=-18 +lat_2=-32 +lat_0=0 +lon_0=24
+x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs
+towgs84=0,0,0
values      : lt.quin.3f.grd
min value   : 1
max value   : 136050

r <- raster(ltq.g)
tr <- blockSize(r)  # n = 267, nrows = 57, size = 57

>From this test, I get quite variable speeds:

i <- 1
dong <- Sys.time()
v <- getValues(ltq.g, row = tr$row[i], nrows = tr$size)
x <- localmerge(v, y, TRUE)
length(x)  # 989748 = ncells in 57 (nrow) *17364(ncol)
ding <- Sys.time()
ding-dong
# Times:
# 2.374584 mins, 49.45746 seconds, 44.11 seconds
# After computer restart: 10.97272 after reboot, 14.53 seconds

The last two times are much better, so perhaps I was having some
performance bottlenecks on my computer.  Do these seem reasonable to
you, given my computer (2.4 GHz Core 2, 4 GB RAM Macbook Pro, 64-bit
R-2.12.2)?

I did catch something potentially important here:

if (canProcessInMemory(x, 3)) {
            v <- localmerge(getValues(x), y, subsWithNA)
            r <- setValues(r, v)
            if (filename != "") {
                r <- writeRaster(r, filename = filename, ...)
            }
            return(r)

Testing this on its own:

canProcessInMemory(ltq.g, 3)

Takes anywhere from 7.6 minutes (after computer restart) to 13.2
minutes (last night before restart), and returns "TRUE". Breaking out
the relevant bits of code for canProcessInMemory:

n <- 3 + (nlayers(ltq.g) - 1)
cells <- round(1.1 * ncell(ltq.g))
#if ((cells * n) > .maxmemory()) {
#        return(FALSE)
#    }
# Note: I couldn't figure out how to access .maxmemory()--it keeps telling me
# "Error: could not find function ".maxmemory"", so presumably this is
something
# internal to raster that I am not smart enough to figure out how to
access, so I used this:

((cells * n) > as.integer(1e+09))  # 1e+09 is my memory limit.
 > TRUE

So it seems my raster might be getting processed entirely in memory.
The slowness of canProcessInMemory then
comes from this line:

r <- try(matrix(0.1, ncol = n, nrow = cells), silent = TRUE)

There is this line at the beginning of canProcessInMemory that seems
to me important, but I get the same problem as with .maxmemory(), in
that I can't figure out how to see its code or what it does:

 if (.toDisk()) {
        return(FALSE)
    }

Based on its name, I assume it checks whether the parent function is
being directed to process onto the disk, i.e. a filename is supplied.
Is this correct?  And, if the answer is yes, is this further
assumption also correct?

A raster function (e.g. subs) run with an output filename and without
assignment to an object must necessarily write to disk.  e.g.

subs(x, y, by = "foo, which = "bar", subsWithNA = TRUE, filename =
out, datatype = type, overwrite = TRUE)

If this last assumption is correct, then it seems that something is
going wrong (or perhaps something I have mis-specified in my code)
such that this large file is being handled in memory when it should be
processed in chunks to disk.

I hope this makes some sense, but I'll appreciate any points of advice on this.

Thanks, Lyndon

p.s. the progress option doesn't work for me on subs, either "text" or
"window". Is this because I am mac, or is it indicative of something
wrong regarding the installation?










format    : raster
datatype  : FLT8S
overwrite : FALSE
progress  : none
timer     : FALSE
chunksize : 1e+06
maxmemory : 1e+09
tmpdir    : /var/folders/Xc/XcJpMYIWFJm5ANTLUN-U2U+++TM/-Tmp-/R_raster_tmp/
setfileext: TRUE



On Thu, Apr 14, 2011 at 1:43 PM, Robert Hijmans <r.hijmans at gmail.com> wrote:
> Lyndon,
>
> I have no idea what might be causing this. I would suggest to track down
> where exactly this happens, by going, line by line, through the code of
> 'subs'
>
> getMethod("subs", c("RasterLayer", "data.frame"))
>
> Presumably, something is not moving here:
>
> for (i in 1:tr$n) {
>     v <- getValues(x, row = tr$row[i], nrows = tr$size)
>     r <- writeValues(r, localmerge(v, y, subsWithNA),  tr$row[i])
>     pbStep(pb)
> }
>
> so you can try
>
>  i <- 1
>  v <- getValues(x, row = tr$row[i], nrows = tr$size)
>  x <- localmerge(v, y, subsWithNA)
>
> and if that last step is indeed very slow, have a step by step look at
> "localmerge".
>
> You can then also see of it matters if you make changes to the "chunksize"
> and "maxmemory" options ?setOptions. These will affect the size of "v" in
> the code above, and perhaps there is a non-linear effect on the performance
> of merge.
>
> Robert
>
>
> --
> View this message in context: http://r-sig-geo.2731867.n2.nabble.com/subs-raster-on-large-dataset-tp6272926p6273741.html
> Sent from the R-sig-geo mailing list archive at Nabble.com.
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>



-- 
Lyndon Estes
Research Associate
Woodrow Wilson School
Princeton University
+1-609-258-2392 (o)
+1-609-258-6082 (f)
+1-202-431-0496 (m)
lestes at princeton.edu



More information about the R-sig-Geo mailing list