[R-sig-Geo] parallel raster processing with calc and mc2d monte carlo simulation
Sean Kearney
sean.kearney at alumni.ubc.ca
Tue Mar 17 18:17:15 CET 2015
Hi Robert,
That did the trick! I didn’t realize you need to put the original function in the ‘export=‘ argument. The following code now works:
brick <- brick(BC_BA, BC_BA_SE, SlopePer) ## Stack three rasters into one RasterBrick
testbrick <- crop(brick, extent(299700, 300100, 1553550, 1553650)) ##Crop Raster Brick to manageable size
####
#### Predict Biomass using final linear model with Monte Carlo Simulation
ndunc(101)
fun.CROP_AGWBC <- function(x) {
require(mc2d)
dBC_BA <- mcdata(x[[1]], type="0")
dBC_BA_SE <- mcdata(x[[2]], type = "0")
SlopePer <-x[[3]]
stBA <- mcstoc(rnorm, type = "U", rtrunc = TRUE,
mean = dBC_BA, sd = dBC_BA_SE, linf = 0, lhs = FALSE)
BC_AGWBC <- lm.final$coefficients[1] +
lm.final$coefficients[2]*stBA +
lm.final$coefficients[3]*SlopePer
AGWBC <- (lambda_DV * BC_AGWBC + 1)^(1/lambda_DV)-1
quantile(AGWBC[], c(0.025, 0.5, 0.975), na.rm=TRUE)
}
####
#### Check function using calc
CROP_AGWBC <- calc(testbrick, fun.CROP_AGWBC)
plot(CROP_AGWBC)
CROP_AGWBC
####
#### Run function using parallel processing
library(snow)
ff <- function(x) calc(x, fun.CROP_AGWBC)
beginCluster(8)
cl <-getCluster()
clusterExport(cl, list("lm.final", "lambda_DV"))
y2 <- clusterR(testbrick, fun = ff, export = "fun.CROP_AGWBC")
endCluster()
plot(y2)
y2
Thanks so much for your help. I’ll try to do a side-by-side and see how much the clusterR() function actually speeds things up. Running calc() without parallel processing was taking ~6hrs per run…
Thanks again!
sean
On Mar 17, 2015, at 10:03 AM, Robert J. Hijmans <r.hijmans at gmail.com> wrote:
> The below works for me. You used function 'f' to clusterR, where it
> should have been 'calc'
>
> library(raster)
> library(snow)
> library(mc2d)
>
> f <- function(x) {
> dBC_BA <- mcdata(x[1], type="0")
> dBC_BA_SE <- mcdata(x[2], type = "0")
> SlopePer <-x[3]
> stBA <- mcstoc(rnorm, type = "U", rtrunc = TRUE, mean = dBC_BA, sd =
> dBC_BA_SE, linf = 0, lhs = FALSE)
> BC_AGWBC <- 0.6419 + 0.9307*stBA + (-0.0176)*SlopePer
> AGWBC <- (0.2626263 * BC_AGWBC + 1)^(1/0.2626263)-1
> quantile(AGWBC[], c(0.025, 0.5, 0.975), na.rm=TRUE)
> }
>
>
> b <- brick(system.file("external/rlogo.grd", package="raster"))
> x <- calc(b, f)
>
> # new function for cluster. You could also rewrite f to include calc
> ff <- function(x) calc(x, f)
>
> beginCluster()
> y <- clusterR(b, fun=ff, export='f')
> endCluster()
>
> On Tue, Mar 17, 2015 at 9:21 AM, Sean Kearney
> <sean.kearney at alumni.ubc.ca> wrote:
>> Hi Robert,
>>
>> Thanks for the advice. So I tried exporting the objects defined in the function (lm.final and lambda_DV) with the following code:
>>
>> library(raster)
>> library(rgdal)
>> library(mc2d)
>> brick <- brick(BC_BA, BC_BA_SE, SlopePer) ## Stack three rasters into one RasterBrick
>> testbrick <- crop(brick, extent(299700, 300100, 1553550, 1553650)) ##Crop Raster Brick to manageable size
>>
>> ####
>> #### Predict Biomass using final linear model with Monte Carlo Simulation
>> ndunc(101)
>> fun.CROP_AGWBC <- function(x) {
>> require(mc2d)
>> dBC_BA <- mcdata(x[[1]], type="0")
>> dBC_BA_SE <- mcdata(x[[2]], type = "0")
>> SlopePer <-x[[3]]
>> stBA <- mcstoc(rnorm, type = "U", rtrunc = TRUE,
>> mean = dBC_BA, sd = dBC_BA_SE, linf = 0, lhs = FALSE)
>> BC_AGWBC <- lm.final$coefficients[1] +
>> lm.final$coefficients[2]*stBA +
>> lm.final$coefficients[3]*SlopePer
>> AGWBC <- (lambda_DV * BC_AGWBC + 1)^(1/lambda_DV)-1
>> quantile(AGWBC[], c(0.025, 0.5, 0.975), na.rm=TRUE)
>> }
>>
>> ####
>> #### Check function using calc
>> CROP_AGWBC <- calc(testbrick, fun.CROP_AGWBC)
>> plot(CROP_AGWBC)
>> CROP_AGWBC
>>
>> ####
>> #### Run function using parallel processing
>> library(snow)
>> beginCluster(8)
>> library(parallel)
>> cl <- getCluster()
>> clusterExport(cl, list("lm.final", "lambda_DV"))
>> clusterR(x = testbrick, fun = fun.CROP_AGWBC)
>> endCluster()
>>
>>
>> RESULT: Again, the calc process works fine to produce object CROP_AGWBC but when I try to run the last bit with parallel processing, I still get the same error:
>>
>>> clusterR(x = testbrick, fun = fun.CROP_AGWBC)
>> [1] "data should be numeric or logical"
>> attr(,"class")
>> [1] "snow-try-error" "try-error"
>> Error in clusterR(x = testbrick, fun = fun.CROP_AGWBC) : cluster error
>>
>>
>> I was suspicious that maybe the issue was still with the export because while “lambda_DV" is a constant, lm.final is actually an lm model, so I replaced these objects in the function with constants (see code below) but still get the same error:
>>
>> REVISED CODE WITH CONSTANTS:
>> library(raster)
>> library(rgdal)
>> library(mc2d)
>> brick <- brick(BC_BA, BC_BA_SE, SlopePer) ## Stack three rasters into one RasterBrick
>> testbrick <- crop(brick, extent(299700, 300100, 1553550, 1553650)) ##Crop Raster Brick to manageable size
>>
>> ####
>> #### Predict Biomass using final linear model with Monte Carlo Simulation
>> ndunc(101)
>> fun2.CROP_AGWBC <- function(x) {
>> require(mc2d)
>> dBC_BA <- mcdata(x[[1]], type="0")
>> dBC_BA_SE <- mcdata(x[[2]], type = "0")
>> SlopePer <-x[[3]]
>> stBA <- mcstoc(rnorm, type = "U", rtrunc = TRUE,
>> mean = dBC_BA, sd = dBC_BA_SE, linf = 0, lhs = FALSE)
>> BC_AGWBC <- 0.6419 +
>> 0.9307*stBA +
>> (-0.0176)*SlopePer
>> AGWBC <- (0.2626263 * BC_AGWBC + 1)^(1/0.2626263)-1
>> quantile(AGWBC[], c(0.025, 0.5, 0.975), na.rm=TRUE)
>> }
>>
>> ####
>> #### Check function using calc
>> CROP_AGWBC_2 <- calc(testbrick, fun2.CROP_AGWBC)
>> plot(CROP_AGWBC_2)
>> CROP_AGWBC_2
>>
>> ####
>> #### Run function using parallel processing
>> library(snow)
>> beginCluster(8)
>> clusterR(x = testbrick, fun = fun2.CROP_AGWBC)
>> endCluster()
>>
>> RESULT: same error:
>>> clusterR(x = testbrick, fun = fun2.CROP_AGWBC)
>> [1] "data should be numeric or logical"
>> attr(,"class")
>> [1] "snow-try-error" "try-error"
>> Error in clusterR(x = testbrick, fun = fun2.CROP_AGWBC) : cluster error
>>
>>
>> Any other thoughts? Many thanks,
>>
>> sean
>>
>> On Mar 17, 2015, at 8:14 AM, Robert J. Hijmans <r.hijmans at gmail.com> wrote:
>>
>>> Sean,
>>>
>>> fun.CROP_AGWBC refers to objects that are not defined inside the
>>> function ("lm.final" and "lambda_DV"). I assume that this is
>>> intentional and that these represent constants; and that they are
>>> available in your global environment. If so, you need to export these
>>> objects to the cluster nodes. See the 'export' argument in clusterR.
>>> You also need to load necessary packages before calling beginCluster
>>>
>>> Robert
>>>
>>> On Mon, Mar 16, 2015 at 1:46 PM, spkearney <sean.kearney at alumni.ubc.ca> wrote:
>>>> Hello all, and thanks in advance for any and all help you can give on this:
>>>>
>>>> I have set up a function to extract the 2.5%, 50% and 97.5% percentiles from
>>>> a monte carlo simulation on three rasters that is to be called up using
>>>> calc() in the raster package and it works great on a test-sized stack/brick,
>>>> thanks to suggestions at this post here:
>>>> http://grokbase.com/t/r/r-sig-geo/123cb3daaq/apply-monte-carlo-simulation-for-each-cell-in-a-matrix-originally-raster
>>>>
>>>> My problem, is that I want to run this function on a much larger Raster
>>>> Brick that, as written, takes hours to process. I need to do this multiple
>>>> times, so I am trying to speed up the processing using clusterR (or another
>>>> option such as rasterEngine with multi-core processing). However, I can't
>>>> get it to work! Here is the code that works on the test raster brick:
>>>>
>>>> brick <- brick(BC_BA, BC_BA_SE, SlopePer) ## Stack three rasters into one
>>>> Raster Brick
>>>> testbrick <- crop(brick, extent(299700, 300100, 1553550, 1553650)) ## Crop
>>>> brick to manageable size
>>>>
>>>> ndunc(101)
>>>> fun.CROP_AGWBC <- function(x) {
>>>> dBC_BA <- mcdata(x[[1]], type="0")
>>>> dBC_BA_SE <- mcdata(x[[2]], type = "0")
>>>> SlopePer <-x[[3]]
>>>> stBA <- mcstoc(rnorm, type = "U", rtrunc = TRUE,
>>>> mean = dBC_BA, sd = dBC_BA_SE, linf = 0, lhs = FALSE)
>>>> BC_AGWBC <- lm.final$coefficients[1] +
>>>> lm.final$coefficients[2]*stBA +
>>>> lm.final$coefficients[3]*SlopePer
>>>> AGWBC <- (lambda_DV * BC_AGWBC + 1)^(1/lambda_DV)-1
>>>> quantile(AGWBC[], c(0.025, 0.5, 0.975), na.rm=TRUE)
>>>> }
>>>>
>>>> CROP_AGWBC <- calc(teststack, fun.CROP_AGWBC) ##Run the calc function
>>>> CROP_AGWBC ##Check the result
>>>> plot(CROP_AGWBC) ##Plot the three-raster brick result
>>>>
>>>> ##Extract the individual raster layers
>>>> CROP_AGWBC_PRED <- CROP_AGWBC[[2]]
>>>> CROP_AGWBC_LWR <- CROP_AGWBC[[1]]
>>>> CROP_AGWBC_UPR <- CROP_AGWBC[[3]]
>>>>
>>>> As I mentioned, the code works great on a small sample. I tried to speed it
>>>> up using clusterR as follows, first testing it on the 'testbrick' Raster
>>>> Brick with hopes to use it on the whole Raster Brick:
>>>>
>>>> beginCluster(8)
>>>> clusterR(x = testbrick, fun = fun.CROP_AGWBC)
>>>>
>>>> and I get the following error:
>>>> [1] "data should be numeric or logical"
>>>> attr(,"class")
>>>> [1] "snow-try-error" "try-error"
>>>> Error in clusterR(x = testbrick, fun = fun.CROP_AGWBC) : cluster error
>>>>
>>>> It is interesting because, if I try to run it again, I get this error
>>>> instead:
>>>> Error in as.vector((x[, 1] - 1) * ncol(object) + x[, 2]) :
>>>> error in evaluating the argument 'x' in selecting a method for function
>>>> 'as.vector': Error in x[, 2] : subscript out of bounds
>>>>
>>>> I have tried this many different ways, including along the lines of:
>>>> f <- function (x) calc(x, fun.CROP_AGWBC)
>>>> y <- clusterR(testbrick, f)
>>>>
>>>> which gives me the same error (more or less) of:
>>>> Error in checkForRemoteErrors(lapply(cl, recvResult)) :
>>>> 2 nodes produced errors; first error: data should be numeric or logical
>>>>
>>>> And I have tried using the rasterEngine() function (first without parallel
>>>> processing) by changing up the code in two ways, the first being:
>>>> ndunc(101)
>>>> fun.CROP_AGWBC <- function(x) {
>>>> dBC_BA <- mcdata(x[[1]], type="0")
>>>> dBC_BA_SE <- mcdata(x[[2]], type = "0")
>>>> SlopePer <-x[[3]]
>>>> stBA <- mcstoc(rnorm, type = "U", rtrunc = TRUE,
>>>> mean = dBC_BA, sd = dBC_BA_SE, linf = 0, lhs = FALSE)
>>>> BC_AGWBC <- lm.final$coefficients[1] +
>>>> lm.final$coefficients[2]*stBA +
>>>> lm.final$coefficients[3]*SlopePer
>>>> AGWBC <- (lambda_DV * BC_AGWBC + 1)^(1/lambda_DV)-1
>>>> output <- quantile(AGWBC[], c(0.025, 0.5, 0.975), na.rm=TRUE)
>>>> output_array <- array(output,dim=c(dim(x)[1],dim(x)[2],3))
>>>> return(output_array)
>>>> }
>>>> re <- rasterEngine(x = testbrick, fun = fun.CROP_AGWBC)
>>>>
>>>> which runs but gives me a 3-layer Raster Brick all with NA's or Inf. The
>>>> second thing I tried used the same fun.CROP_AGWBC function as above, but
>>>> with the following rasterEngine code to call up the calc formula:
>>>> f <- function(x) {reout <- calc(x, fun.CROP_AGWBC)
>>>> reout_array <- array(getValues(reout),
>>>> dim=c(dim(x)[1],dim(x)[2],3))
>>>> return(reout_array)
>>>> }
>>>> re <- rasterEngine(x = testbrick, fun = f, chunk_format = "raster")
>>>>
>>>> which gives me the following error, even though I thought I converted the
>>>> output to an array:
>>>> chunk processing units require array vector outputs. Please check your
>>>> function.
>>>> Error in focal_hpc_test(x, fun, window_center, window_dims, args,
>>>> layer_names, :
>>>>
>>>> So, my questions are as follows:
>>>> *1) Does anyone know why the clusterR does not work for the calc() function
>>>> in my first attempt? I imagine it has something to do with the conversion
>>>> of rasters to mcnodes in the function, but can't figure it out! Any
>>>> suggestions?
>>>>
>>>> 2) Any thoughts on why I can't get this to work with the rasterEngine()
>>>> function? I am converting the outputs to arrays with the same dimensions as
>>>> the input file, but still no luck.*
>>>>
>>>> Again, any help is much appreciated. Any suggestions for improving this
>>>> question are welcome and I'll do my best to update it - this is my first
>>>> post!
>>>>
>>>> Kind Regards,
>>>> sean
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://r-sig-geo.2731867.n2.nabble.com/parallel-raster-processing-with-calc-and-mc2d-monte-carlo-simulation-tp7587901.html
>>>> Sent from the R-sig-geo mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> R-sig-Geo mailing list
>>>> R-sig-Geo at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
More information about the R-sig-Geo
mailing list