[R-sig-Geo] Parallel processing engine for raster now available on CRAN
Jonathan Greenberg
jgrn at illinois.edu
Mon Feb 25 16:10:20 CET 2013
Folks:
After about a year of working on this package, I finally got
spatial.tools pushed out to CRAN this weekend. The core function
(?focal.hpc) provides an engine for realizing parallel/cluster
processing of raster files with some features to make the development
of the functions easier for users. Among the key features of the
engine:
1) Supports both single-pixel and local window processing (in parallel).
2) Works with any parallel backend that can be registered with foreach
(parallel, snow, Rmpi, multicore, etc).
3) Supports asynchronous and (OS-willing) parallel writes to the
output file using functionality provided by mmap.
4) Function writing should be intuitive -- for pixel-based functions,
your function will receive an
array with dimensions representing column/row/bands of a chunk, and
should return a chunk of equal columns/rows. For focal functions, the
function will receive a single local window, and should return a
single value or vector of values representing a single pixel (with
potentially multiple bands).
I've also developed a few convenience functions for quickly
starting/stopping a parallel engine and registering it with foreach
(sfQuickInit and sfQuickStop).
As with all things, your mileage with parallel processing may vary.
The test dataset is relatively small so you may see a somewhat longer
execution, but the benefit of this package should (hopefully) be seen
with larger datasets.
I would love if some r-sig-geo'ers working with larger rasters could
take this package for a drive and let me know if they are seeing a
benefit, and pass along any suggested improvements/bugs! Cheers!
Here are a few examples:
### spatial.tools
install.packages("spatial.tools")
help(package="spatial.tools")
library("spatial.tools")
# Test file:
tahoe_highrez <- brick(system.file("external/tahoe_highrez.tif",
package="spatial.tools"))
# Start a parallel engine and register it with foreach (uses all processors):
sfQuickInit()
# NDVI function:
ndvi_function <- function(x,...)
{
# Note that x is received by the function as a 3-d array:
red_band <- x[,,2]
nir_band <- x[,,3]
ndvi <- (nir_band - red_band)/(nir_band + red_band)
# The output of the function should also be a 3-d array,
# even if it is a single band:
ndvi <- array(ndvi,dim=c(dim(x)[1],dim(x)[2],1))
return(ndvi)
}
# The first time you run the function, there is an overhead associated with
# loading the packages into the cluster:
system.time(tahoe_ndvi <- focal_hpc(x=tahoe_highrez,fun=ndvi_function))
# Second time will execute faster:
system.time(tahoe_ndvi <- focal_hpc(x=tahoe_highrez,fun=ndvi_function))
plot(tahoe_ndvi)
# Local window smoothing function:
local_smoother <- function(x,...)
{
# Assumes a 3-d array representing
# a single local window, and return
# a single value or a vector of values.
smoothed <- apply(x,3,mean)
return(smoothed)
}
# window_dims gives the local window size, in this case 3 x 3:
system.time(
tahoe_3x3_smoothed <-
focal_hpc(x=tahoe_highrez,fun=local_smoother,window_dims=c(3,3))
)
plotRGB(tahoe_3x3_smoothed)
# Same function, used on a 7 x 7 window:
tahoe_7x7_smoothed <-
focal_hpc(x=tahoe_highrez,fun=local_smoother,window_dims=c(7,7))
plotRGB(tahoe_7x7_smoothed)
# Stop the cluster and re-register foreach with a sequential run:
sfQuickStop()
# Note that focal_hpc will still work even with the cluster stopped,
albeit (possibly) a lot slower:
system.time(
tahoe_3x3_smoothed <-
focal_hpc(x=tahoe_highrez,fun=local_smoother,window_dims=c(3,3))
)
# This took about 4 times longer on my 8 cpu system.
###
I wanted to thank to Robert Hijimans, Roger Bivand, Jeff Ryan (mmap),
and Simon Urbanek, among many others, for helping me get this thing
working!
--jonathan
--
Jonathan A. Greenberg, PhD
Assistant Professor
Global Environmental Analysis and Remote Sensing (GEARS) Laboratory
Department of Geography and Geographic Information Science
University of Illinois at Urbana-Champaign
607 South Mathews Avenue, MC 150
Urbana, IL 61801
Phone: 217-300-1924
http://www.geog.illinois.edu/~jgrn/
AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007
More information about the R-sig-Geo
mailing list