[R-sig-Geo] parallel raster processing with calc and mc2d monte carlo simulation

Robert J. Hijmans r.hijmans at gmail.com
Tue Mar 17 18:03:22 CET 2015


The below works for me. You used function 'f' to clusterR, where it
should have been  'calc'

library(raster)
library(snow)
library(mc2d)

f <- function(x) {
  dBC_BA <- mcdata(x[1], type="0")
  dBC_BA_SE <- mcdata(x[2], type = "0")
  SlopePer <-x[3]
  stBA <- mcstoc(rnorm, type = "U", rtrunc = TRUE, mean = dBC_BA, sd =
dBC_BA_SE, linf = 0, lhs = FALSE)
  BC_AGWBC <- 0.6419 + 0.9307*stBA + (-0.0176)*SlopePer
  AGWBC <- (0.2626263 * BC_AGWBC + 1)^(1/0.2626263)-1
  quantile(AGWBC[], c(0.025, 0.5, 0.975), na.rm=TRUE)
}


b <- brick(system.file("external/rlogo.grd", package="raster"))
x <- calc(b, f)

# new function for cluster. You could also rewrite f to include calc
ff <- function(x) calc(x, f)

beginCluster()
y <- clusterR(b, fun=ff, export='f')
endCluster()

On Tue, Mar 17, 2015 at 9:21 AM, Sean Kearney
<sean.kearney at alumni.ubc.ca> wrote:
> Hi Robert,
>
> Thanks for the advice.  So I tried exporting the objects defined in the function (lm.final and lambda_DV) with the following code:
>
> library(raster)
> library(rgdal)
> library(mc2d)
> brick <- brick(BC_BA, BC_BA_SE, SlopePer)  ## Stack three rasters into one RasterBrick
> testbrick <- crop(brick, extent(299700, 300100, 1553550, 1553650)) ##Crop Raster Brick to manageable size
>
> ####
> #### Predict Biomass using final linear model with Monte Carlo Simulation
> ndunc(101)
> fun.CROP_AGWBC <- function(x) {
>   require(mc2d)
>   dBC_BA <- mcdata(x[[1]], type="0")
>   dBC_BA_SE <- mcdata(x[[2]], type = "0")
>   SlopePer <-x[[3]]
>   stBA <- mcstoc(rnorm, type = "U", rtrunc = TRUE,
>                  mean = dBC_BA, sd = dBC_BA_SE, linf = 0, lhs = FALSE)
>   BC_AGWBC <- lm.final$coefficients[1] +
>     lm.final$coefficients[2]*stBA +
>     lm.final$coefficients[3]*SlopePer
>   AGWBC <- (lambda_DV * BC_AGWBC + 1)^(1/lambda_DV)-1
>   quantile(AGWBC[], c(0.025, 0.5, 0.975), na.rm=TRUE)
> }
>
> ####
> #### Check function using calc
> CROP_AGWBC <- calc(testbrick, fun.CROP_AGWBC)
> plot(CROP_AGWBC)
> CROP_AGWBC
>
> ####
> #### Run function using parallel processing
> library(snow)
> beginCluster(8)
> library(parallel)
> cl <- getCluster()
> clusterExport(cl, list("lm.final", "lambda_DV"))
> clusterR(x = testbrick, fun = fun.CROP_AGWBC)
> endCluster()
>
>
> RESULT: Again, the calc process works fine to produce object CROP_AGWBC but when I try to run the last bit with parallel processing, I still get the same error:
>
>> clusterR(x = testbrick, fun = fun.CROP_AGWBC)
> [1] "data should be numeric or logical"
> attr(,"class")
> [1] "snow-try-error" "try-error"
> Error in clusterR(x = testbrick, fun = fun.CROP_AGWBC) : cluster error
>
>
> I was suspicious that maybe the issue was still with the export because while “lambda_DV" is a constant, lm.final is actually an lm model, so I replaced these objects in the function with constants (see code below) but still get the same error:
>
> REVISED CODE WITH CONSTANTS:
> library(raster)
> library(rgdal)
> library(mc2d)
> brick <- brick(BC_BA, BC_BA_SE, SlopePer)  ## Stack three rasters into one RasterBrick
> testbrick <- crop(brick, extent(299700, 300100, 1553550, 1553650)) ##Crop Raster Brick to manageable size
>
> ####
> #### Predict Biomass using final linear model with Monte Carlo Simulation
> ndunc(101)
> fun2.CROP_AGWBC <- function(x) {
>   require(mc2d)
>   dBC_BA <- mcdata(x[[1]], type="0")
>   dBC_BA_SE <- mcdata(x[[2]], type = "0")
>   SlopePer <-x[[3]]
>   stBA <- mcstoc(rnorm, type = "U", rtrunc = TRUE,
>                  mean = dBC_BA, sd = dBC_BA_SE, linf = 0, lhs = FALSE)
>   BC_AGWBC <- 0.6419 +
>     0.9307*stBA +
>     (-0.0176)*SlopePer
>   AGWBC <- (0.2626263 * BC_AGWBC + 1)^(1/0.2626263)-1
>   quantile(AGWBC[], c(0.025, 0.5, 0.975), na.rm=TRUE)
> }
>
> ####
> #### Check function using calc
> CROP_AGWBC_2 <- calc(testbrick, fun2.CROP_AGWBC)
> plot(CROP_AGWBC_2)
> CROP_AGWBC_2
>
> ####
> #### Run function using parallel processing
> library(snow)
> beginCluster(8)
> clusterR(x = testbrick, fun = fun2.CROP_AGWBC)
> endCluster()
>
> RESULT: same error:
>> clusterR(x = testbrick, fun = fun2.CROP_AGWBC)
> [1] "data should be numeric or logical"
> attr(,"class")
> [1] "snow-try-error" "try-error"
> Error in clusterR(x = testbrick, fun = fun2.CROP_AGWBC) : cluster error
>
>
> Any other thoughts?  Many thanks,
>
> sean
>
> On Mar 17, 2015, at 8:14 AM, Robert J. Hijmans <r.hijmans at gmail.com> wrote:
>
>> Sean,
>>
>> fun.CROP_AGWBC  refers to objects that are not defined inside the
>> function ("lm.final" and "lambda_DV"). I assume that this is
>> intentional and that these represent constants; and that they are
>> available in your global environment. If so, you need to export these
>> objects to the cluster nodes. See the 'export' argument in clusterR.
>> You also need to load necessary packages before calling beginCluster
>>
>> Robert
>>
>> On Mon, Mar 16, 2015 at 1:46 PM, spkearney <sean.kearney at alumni.ubc.ca> wrote:
>>> Hello all, and thanks in advance for any and all help you can give on this:
>>>
>>> I have set up a function to extract the 2.5%, 50% and 97.5% percentiles from
>>> a monte carlo simulation on three rasters that is to be called up using
>>> calc() in the raster package and it works great on a test-sized stack/brick,
>>> thanks to suggestions at this post here:
>>> http://grokbase.com/t/r/r-sig-geo/123cb3daaq/apply-monte-carlo-simulation-for-each-cell-in-a-matrix-originally-raster
>>>
>>> My problem, is that I want to run this function on a much larger Raster
>>> Brick that, as written, takes hours to process.  I need to do this multiple
>>> times, so I am trying to speed up the processing using clusterR (or another
>>> option such as rasterEngine with multi-core processing).  However, I can't
>>> get it to work!   Here is the code that works on the test raster brick:
>>>
>>> brick <- brick(BC_BA, BC_BA_SE, SlopePer)  ## Stack three rasters into one
>>> Raster Brick
>>> testbrick <- crop(brick, extent(299700, 300100, 1553550, 1553650)) ## Crop
>>> brick to manageable size
>>>
>>> ndunc(101)
>>> fun.CROP_AGWBC <- function(x) {
>>>  dBC_BA <- mcdata(x[[1]], type="0")
>>>  dBC_BA_SE <- mcdata(x[[2]], type = "0")
>>>  SlopePer <-x[[3]]
>>>  stBA <- mcstoc(rnorm, type = "U", rtrunc = TRUE,
>>>                 mean = dBC_BA, sd = dBC_BA_SE, linf = 0, lhs = FALSE)
>>>  BC_AGWBC <- lm.final$coefficients[1] +
>>>    lm.final$coefficients[2]*stBA +
>>>    lm.final$coefficients[3]*SlopePer
>>>  AGWBC <- (lambda_DV * BC_AGWBC + 1)^(1/lambda_DV)-1
>>>  quantile(AGWBC[], c(0.025, 0.5, 0.975), na.rm=TRUE)
>>> }
>>>
>>> CROP_AGWBC <- calc(teststack, fun.CROP_AGWBC) ##Run the calc function
>>> CROP_AGWBC ##Check the result
>>> plot(CROP_AGWBC) ##Plot the three-raster brick result
>>>
>>> ##Extract the individual raster layers
>>> CROP_AGWBC_PRED <- CROP_AGWBC[[2]]
>>> CROP_AGWBC_LWR <- CROP_AGWBC[[1]]
>>> CROP_AGWBC_UPR <- CROP_AGWBC[[3]]
>>>
>>> As I mentioned, the code works great on a small sample.  I tried to speed it
>>> up using clusterR as follows, first testing it on the 'testbrick' Raster
>>> Brick with hopes to use it on the whole Raster Brick:
>>>
>>> beginCluster(8)
>>> clusterR(x = testbrick, fun = fun.CROP_AGWBC)
>>>
>>> and I get the following error:
>>> [1] "data should be numeric or logical"
>>> attr(,"class")
>>> [1] "snow-try-error" "try-error"
>>> Error in clusterR(x = testbrick, fun = fun.CROP_AGWBC) : cluster error
>>>
>>> It is interesting because, if I try to run it again, I get this error
>>> instead:
>>> Error in as.vector((x[, 1] - 1) * ncol(object) + x[, 2]) :
>>>  error in evaluating the argument 'x' in selecting a method for function
>>> 'as.vector': Error in x[, 2] : subscript out of bounds
>>>
>>> I have tried this many different ways, including along the lines of:
>>> f <- function (x) calc(x, fun.CROP_AGWBC)
>>> y <- clusterR(testbrick, f)
>>>
>>> which gives me the same error (more or less) of:
>>> Error in checkForRemoteErrors(lapply(cl, recvResult)) :
>>>  2 nodes produced errors; first error: data should be numeric or logical
>>>
>>> And I have tried using the rasterEngine() function (first without parallel
>>> processing) by changing up the code in two ways, the first being:
>>> ndunc(101)
>>> fun.CROP_AGWBC <- function(x) {
>>>  dBC_BA <- mcdata(x[[1]], type="0")
>>>  dBC_BA_SE <- mcdata(x[[2]], type = "0")
>>>  SlopePer <-x[[3]]
>>>  stBA <- mcstoc(rnorm, type = "U", rtrunc = TRUE,
>>>                 mean = dBC_BA, sd = dBC_BA_SE, linf = 0, lhs = FALSE)
>>>  BC_AGWBC <- lm.final$coefficients[1] +
>>>    lm.final$coefficients[2]*stBA +
>>>    lm.final$coefficients[3]*SlopePer
>>>  AGWBC <- (lambda_DV * BC_AGWBC + 1)^(1/lambda_DV)-1
>>>  output <- quantile(AGWBC[], c(0.025, 0.5, 0.975), na.rm=TRUE)
>>>  output_array <- array(output,dim=c(dim(x)[1],dim(x)[2],3))
>>>  return(output_array)
>>> }
>>> re <- rasterEngine(x = testbrick, fun = fun.CROP_AGWBC)
>>>
>>> which runs but gives me a 3-layer Raster Brick all with NA's or Inf.  The
>>> second thing I tried used the same fun.CROP_AGWBC function as above, but
>>> with the following rasterEngine code to call up the calc formula:
>>> f <- function(x) {reout <- calc(x, fun.CROP_AGWBC)
>>>                  reout_array <- array(getValues(reout),
>>> dim=c(dim(x)[1],dim(x)[2],3))
>>>                  return(reout_array)
>>> }
>>> re <- rasterEngine(x = testbrick, fun = f, chunk_format = "raster")
>>>
>>> which gives me the following error, even though I thought I converted the
>>> output to an array:
>>> chunk processing units require array vector outputs.  Please check your
>>> function.
>>> Error in focal_hpc_test(x, fun, window_center, window_dims, args,
>>> layer_names,  :
>>>
>>> So, my questions are as follows:
>>> *1) Does anyone know why the clusterR does not work for the calc() function
>>> in my first attempt?  I imagine it has something to do with the conversion
>>> of rasters to mcnodes in the function, but can't figure it out!  Any
>>> suggestions?
>>>
>>> 2) Any thoughts on why I can't get this to work with the rasterEngine()
>>> function?  I am converting the outputs to arrays with the same dimensions as
>>> the input file, but still no luck.*
>>>
>>> Again, any help is much appreciated.  Any suggestions for improving this
>>> question are welcome and I'll do my best to update it - this is my first
>>> post!
>>>
>>> Kind Regards,
>>> sean
>>>
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context: http://r-sig-geo.2731867.n2.nabble.com/parallel-raster-processing-with-calc-and-mc2d-monte-carlo-simulation-tp7587901.html
>>> Sent from the R-sig-geo mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> R-sig-Geo at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>



More information about the R-sig-Geo mailing list