[R-sig-Geo] Parallel predict now in spatial.tools

Tim Howard tghoward at gw.dec.state.ny.us
Wed Mar 26 12:49:42 CET 2014


Jonathan,
Thank you for putting this together and for the example. I'm doing two things differently with randomForest ... I think perhaps one of them the function isn't handling.  
 
First, based on recommendations from Andy Liaw (and ?randomForest), I don't use the formula interface but use x=<many columns>, y=<a column> in the call. Does predict_rasterEngine handle the absence of a formula in the object?
 
Second, I have many large rasters I want to run the predict on, so making a brick would be difficult. I use a rasterStack instead. Does your example work with a rasterStack? 
 
I can dive deeper if any of this isn't clear or if these two tweaks work just fine for you. I was just trying to swap out this version of predict with another parallel version to evaluate speed and, while the alternate version works fine, predict_rasterEngine bailed on me. 
 
Thanks in advance. 
Tim Howard
 
 

>>>>>>
Date: Tue, 18 Mar 2014 22:14:23 -0500
From: Jonathan Greenberg <jgrn at illinois.edu>
To: "r-sig-geo at r-project.org" <R-sig-Geo at r-project.org>
Subject: [R-sig-Geo] Parallel predict now in spatial.tools
Message-ID:
<CABG0rfseg+p0h4HdYOK+_Za=OLMeTKHAT+TQn7g_FkEdYiunFQ at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

R-sig-geo'ers:

I finally got around to building a parallel predict statement that
I've included in version 1.3.7 (or later) of spatial.tools (check
http://r-forge.r-project.org/R/?group_id=1492 for the status of the
build), "predict_rasterEngine".  It should, in theory, be a direct
swap-in for the standard generic predict() statement.  Currently, it
will work on any predict.* statement that has the following features:
1) The data is passed to the predict as a data frame via a newdata
parameter, and
2) The data is returned from the predict statement as a vector/matrix.

When using predict_rasterEngine, the object= parameter is your model,
and the newdata= parameter is the raster/brick/stack to apply the
model to on a pixel-by-pixel basis (note that the names of the layers
must match the names of the predictor variables, in most cases).

I was hoping to get some stress-testing on this, since it is a fairly
oft-requested function.  If a predict.* function you'd like to use
doesn't work, let me know which function it is (with some test data)
and I'll see if I can tweak it to work.

Right now, I have confirmed this works with randomForest.  Here's an example:

######################

packages_required <- c("spatial.tools","doParallel","randomForest")
lapply(packages_required, require, character.only=T)

# Load up a 3-band image:
tahoe_highrez <- setMinMax(
brick(system.file("external/tahoe_highrez.tif", package="spatial.tools")))
tahoe_highrez
plotRGB(tahoe_highrez)

# Load up some training points:
tahoe_highrez_training_points <- readOGR(
dsn=system.file("external", package="spatial.tools"),
layer="tahoe_highrez_training_points")

# Extract data to train the randomForest model:
tahoe_highrez_training_extract <- extract(
tahoe_highrez,
tahoe_highrez_training_points,
df=TRUE)

# Fuse it back with the SPECIES info:
tahoe_highrez_training_extract$SPECIES <- tahoe_highrez_training_points$SPECIES

# Note the names of the bands:
names(tahoe_highrez_training_extract) # the extracted data
names(tahoe_highrez) # the brick

# Generate a randomForest model:
tahoe_rf <- randomForest(SPECIES~tahoe_highrez.1+tahoe_highrez.2+tahoe_highrez.3,
data=tahoe_highrez_training_extract)

tahoe_rf

# This will run the predict in parallel:
sfQuickInit()
prediction_rf_class <-
predict_rasterEngine(object=tahoe_rf,newdata=tahoe_highrez,type="response")
prediction_rf_prob <-
predict_rasterEngine(object=tahoe_rf,newdata=tahoe_highrez,type="prob")
sfQuickStop()

###############

--j
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20140326/360372bf/attachment.html>


More information about the R-sig-Geo mailing list