[R-sig-Geo] Antw: Re: Antw: Error when using rowSums along with a RasterStack object

Thiago Veloso thi_veloso at yahoo.com.br
Tue May 15 13:27:15 CEST 2012


  Robert and Matteo,

  Thanks for your suggestions. I ended up running the old version of my script, which, after some crop operations, takes 'only' 40 minutes per annual-summation relative to the old 51-minute mark.

  Although your suggestions on memory optimization are really tempting, the server I use is shared with my workgroup and I don't want to create a memory black hole!

  For everyone's appreciation, I am pasting the code in the end of this message.

  All the best,
  Thiago.

# loading required packages (raster will do the most of the processing, 

# rgdal is used to load images, ncdf writes the resulting netcdf files
# and maptools is used to load shapefiles which crops the images.)
library(raster)
library(rgdal)
library(ncdf)
library(maptools)

# set the working directory
setwd("/mnt/disco3/MODIS/MOSAIC/GPP")

# load and project shapefiles to crop images later
sa<-readShapePoly("/mnt/disco3/MODIS/shapes/southamerica.shp")
br<-readShapePoly("/mnt/disco3/MODIS/shapes/brazil.shp")
shape_proj<-raster('/mnt/disco3/MODIS/MOSAIC/GPP/gpp2001001.Gpp_1km.tif')
projection(sa)<-projection(shape_proj)
projection(br)<-projection(shape_proj)

# define a function to perform common tasks
fun <- function(x) {
        x [x > 32760] <- NA #remove unvalid range of values (water, unclassified pixes etc): 32760 for GPP and  for LAI
        x <- x * 0.0001 #scale factor for MODIS GPP product: 0.0001 for GPP and 0.1 for LAI
        x <- sum(x,na.rm=FALSE) #integrating 46 satellite images takes 51 minutes (3,104 seconds) to run
}

# Once the function has been defined, it should be relatively
# straightforward to loop over years to perform annual integration
files <- list.files(pattern='.Gpp_1km.tif',  full.names=TRUE)
years <- substr(files, 6, 9)
for (y in 2001:2011) {
       f <- files[ years == y ]
       # stack one year's all files
       s <- stack(f) #stacking 46 images takes 10 seconds to run
       s <- crop(s,br)
       fout  <- paste('/mnt/disco3/MODIS/MOSAIC/GPP/yearly/gpp_', y,'.nc', sep='')
       x <- calc(s, fun, filename=fout)
}
________________________________
From: Robert J. Hijmans <r.hijmans at gmail.com>
To: thi_veloso at yahoo.com.br
Cc: r-sig-geo at r-project.org; Matteo Mattiuzzi <matteo.mattiuzzi at boku.ac.at> 
Sent: Monday, May 14, 2012 1:25 PM
Subject: Re: [R-sig-Geo] Antw: Re: Antw: Error when using rowSums along with a RasterStack object


Thiago, 

I forgot to mention that you may be able to further speed things up by processing the data in bigger chunks, using 

setOptions(chunksize=   )

The default setting is somewhat conservative, to avoid errors from occurring, but if you want to live in the fast lane, give it a try.

Also, Matteo pointed out to me that  calc(x, sum)  also uses rowSums; so speed should be equivalent for sum and calc. calc has the advantage of being able to set a filename. 

Robert


On Sun, May 13, 2012 at 9:03 AM, Robert J. Hijmans <r.hijmans at gmail.com> wrote:

sum should be faster than calc(x, sum) because sum uses rowSums whereas calc, in this case, would call apply(values(x), 1, sum). calc(x, function(i) rowSums(i)) should be equivalent to sum (in speed and results). 
>
>
>Whether making a RasterBrick is worth it depends on how many times you will use these files for this type of computation. If it is only for one computation it should be slower to first make bricks, as the main bottleneck in these computations is disk I/O. I cannot think of another way to speed things up right now --- something we are working through clustering (probably not too helpful here), and C routines.
>
>
>30 hours is long, but it is the computer that's working, not you.
>
>Robert
>
>
>
>
>On Sun, May 13, 2012 at 4:46 AM, Matteo Mattiuzzi <matteo.mattiuzzi at boku.ac.at> wrote:
>
>Dear Thiago,
>>
>>I don't know what makes sum() more suitable comparing to calc() but I think a fakt that you are useing a stack (and not a brick) influences also the speed of processing:
>>
>>######################
>>
>>library(raster)
>>r <- raster()
>>r[] <- 1:ncell(r)
>>
>>setOptions(todisk=TRUE)
>>system.time(st <- stack(r,r,r,r,r,r,r,r,r))
>>system.time(br <- brick(r,r,r,r,r,r,r,r,r))
>>
>>system.time(sum(st,na.rm=FALSE))
>>system.time(sum(br,na.rm=FALSE))
>>
>>system.time(calc(st,sum,na.rm=FALSE))
>>system.time(calc(br,sum,na.rm=FALSE))
>>
>>setOptions(todisk=FALSE)
>>#######################
>>
>># Of coarse the creation of a brick takes a bit of time since the data needs to be re-organised physically on the disk.
>># But this initial overhead worthwhile relatively quickly, you have to check this.
>>
>># Cheers Matteo
>>
>>
>>
>>
>>>>> Thiago Veloso  13.05.12 1.53 Uhr >>>
>>
>> Robert,
>> Thanks for your input. Your suggestion is more suitable to my case, but it takes almost one hour to sum a stack containing 46 satellite images. Please see below:
>>> sclass       : RasterStack dimensions  : 5568, 8289, 46153152, 46  (nrow, ncol, ncell, nlayers)resolution  : 0.00898, 0.00898  (x, y)extent      : -104.4326, -29.99736, -40.00064, 10  (xmin, xmax, ymin, ymax)coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 min values  : -32768, -32768, -32768, -32768, -32768, -32768, -32768, -32768, -32768, -32768, -32768, -32768, -32768, -32768, -32768, ... max values  : 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, ... layer names : gpp2004001.Gpp_1km, gpp2004009.Gpp_1km, gpp2004017.Gpp_1km, gpp2004025.Gpp_1km, gpp2004033.Gpp_1km, gpp2004041.Gpp_1km, gpp2004049.Gpp_1km, gpp2004057.Gpp_1km, gpp2004065.Gpp_1km, gpp2004073.Gpp_1km, gpp2004081.Gpp_1km, gpp2004089.Gpp_1km, gpp2004097.Gpp_1km, gpp2004105.Gpp_1km, gpp2004113.Gpp_1km, ...
>>
>> For reference, total size of the 46 images which compose the stack reaches 4GB. Here is how long it takes to perform the sum over the stack:
>>> system.time(sos <- sum(s,na.rm=FALSE))    user   system  elapsed  743.098   31.258 3104.113
>> The machine I am using to process this dataset is an 3GHz quadcore with 16GB of RAM, running Opensuse and latest R (2.15.0).
>> As processing my entire data involves performing this operation at least 30 times, is there a way to optimize the process?
>> All the best,  Thiago.
>>
>>--- On Sat, 12/5/12, Robert J. Hijmans  wrote:
>>
>>From: Robert J. Hijmans
>>
>>Subject: Re: [R-sig-Geo] Antw: Error when using rowSums along with a RasterStack object
>>To: thi_veloso at yahoo.com.br
>>Cc: r-sig-geo at r-project.org
>>Date: Saturday, 12 May, 2012, 13:43
>>
>>You can also do
>>sos <- sum(s, na.rm=FALSE)
>>Best, Robert
>>
>>
>>
>>On Sat, May 12, 2012 at 2:26 AM, Matteo Mattiuzzi  wrote:
>>
>>Dear Thiago,
>>
>>
>>
>>is this what you need?
>>
>>####
>>
>>library(raster)
>>
>>r <- raster()
>>
>>r[]<-1:ncell(r)
>>
>>r2 <- raster()
>>
>>r2[]<-ncell(r):1
>>
>>r <-brick(r,r2)
>>
>>plot(r)
>>
>>
>>
>>sos <- calc(r,sum)
>>
>>x11()
>>
>>plot(sos)
>>
>>####
>>
>>Matteo
>>
>>
>>
>>sos <- calc(r,sum)
>>
>>plot(sos)
>>
>>
>>
>>>>> Thiago Veloso  11.05.12 23.29 Uhr >>>
>>
>>Dear R-colleagues,
>>
>>
>>
>>I am trying to sum all layers of a RasterStack object, which is summarized below:
>>
>>
>>
>>> s
>>
>>class       : RasterStack
>>
>>dimensions  : 5568, 8289, 46153152, 46  (nrow, ncol, ncell, nlayers)
>>
>>resolution  : 0.00898, 0.00898  (x, y)
>>
>>extent      : -104.4326, -29.99736, -40.00064, 10  (xmin, xmax, ymin, ymax)
>>
>>coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
>>
>>min values  : -32768, -32768, -32768, -32768, -32768, -32768, -32768, -32768, -32768, -32768, -32768, -32768, -32768, -32768, -32768, ...
>>
>>max values  : 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, ...
>>
>>layer names : gpp2004001.Gpp_1km, gpp2004009.Gpp_1km, gpp2004017.Gpp_1km, gpp2004025.Gpp_1km, gpp2004033.Gpp_1km, gpp2004041.Gpp_1km, gpp2004049.Gpp_1km, gpp2004057.Gpp_1km, gpp2004065.Gpp_1km, gpp2004073.Gpp_1km, gpp2004081.Gpp_1km, gpp2004089.Gpp_1km, gpp2004097.Gpp_1km, gpp2004105.Gpp_1km, gpp2004113.Gpp_1km, ...
>>
>>
>>
>>
>>The raster 's' is composed by 46 satellite images at a resolution of 1km (0.00898degree). Each layer is a 8-day composition of leaf area index and I need to integrate an entire year of LAI. When I try to run the "rowSums" function in this RasterStack, the following error is displayed:
>>
>>
>>
>>
>>> sos <- rowSums(s,na.rm=FALSE)
>>
>>Error in rowSums(s, na.rm = FALSE) :
>>
>> 'x' must be an array of at least two dimensions
>>
>>
>>
>>The same occurs when I try to use .rowSums providing matrix dimensions:
>>
>>
>>
>>> sos <- .rowSums(s,5568,8289)
>>
>>Error in .rowSums(s, 5568, 8289) : 'x' must be numeric
>>
>>
>>
>>
>>
>>What am I doing wrong? Is there another way to integrate the layers of a RasterStack object?
>>
>>
>>
>>Thanks in advance,
>>
>>Thiago Veloso.
>>
>>
>>
>>P.S.: Attaching sessionInfo output to provide useful information:
>>
>>
>>
>>> sessionInfo()
>>
>>R version 2.15.0 (2012-03-30)
>>
>>Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>>
>>
>>locale:
>>
>> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>
>> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>
>> [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>
>> [7] LC_PAPER=C                 LC_NAME=C
>>
>> [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>
>>[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>>
>>
>>attached base packages:
>>
>>[1] stats     graphics  grDevices utils     datasets  methods   base
>>
>>
>>
>>other attached packages:
>>
>>[1] ncdf_1.6.6    rgdal_0.7-11  raster_1.9-92 sp_0.9-99
>>
>>
>>
>>loaded via a namespace (and not attached):
>>
>>[1] grid_2.15.0    lattice_0.20-6
>>
>>
>>
>>
>>
>>_______________________________________________
>>
>>R-sig-Geo mailing list
>>
>>R-sig-Geo at r-project.org
>>
>>https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>>
>>
>>
>>
>>       [[alternative HTML version deleted]]
>>
>>
>>
>>_______________________________________________
>>
>>R-sig-Geo mailing list
>>
>>R-sig-Geo at r-project.org
>>
>>https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>>
>>
>>
>>
   [[alternative HTML version deleted]]
>>
>>
>>
>>
>>       [[alternative HTML version deleted]]
>>
>>_______________________________________________
>>R-sig-Geo mailing list
>>R-sig-Geo at r-project.org
>>https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>




More information about the R-sig-Geo mailing list