[R-sig-Geo] Parallel predict now in spatial.tools

Tim Howard tghoward at gw.dec.state.ny.us
Wed Mar 26 15:48:29 CET 2014


Jonathan,
Thanks for your quick reply. I just tried your Tahoe example using x and y in the call rather than a formula and it worked fine, so I was barking up the wrong tree. Sorry. I'll try to delve deeper and put together a subset example for you to check out if I can't get anywhere. 
 
Thanks,
Tim

>>> Jonathan Greenberg <jgrn at illinois.edu> 3/26/2014 10:23 AM >>>
Hi Tim:

Re: stack, yep, it should work.  In general, you get a decreased
performance from stacks since you are having to read from multiple
files rather than a single one, but it will still benefit from
parallel processing.

Re: formula -- the best way for me to test this is to crop out a piece
of your image and send me the random forest model (use ?save), and the
call you were using.  In theory you should be able to use anything you
would normally use on a data frame, but I'd have to play with it to
confirm!  If you can set up those on e.g. google drive I can test it
out.  Cheers!

--j

On Wed, Mar 26, 2014 at 6:49 AM, Tim Howard <tghoward at gw.dec.state.ny.us> wrote:
> Jonathan,
> Thank you for putting this together and for the example. I'm doing two
> things differently with randomForest ... I think perhaps one of them the
> function isn't handling.
>
> First, based on recommendations from Andy Liaw (and ?randomForest), I don't
> use the formula interface but use x=<many columns>, y=<a column> in the
> call. Does predict_rasterEngine handle the absence of a formula in the
> object?
>
> Second, I have many large rasters I want to run the predict on, so making a
> brick would be difficult. I use a rasterStack instead. Does your example
> work with a rasterStack?
>
> I can dive deeper if any of this isn't clear or if these two tweaks work
> just fine for you. I was just trying to swap out this version of predict
> with another parallel version to evaluate speed and, while the alternate
> version works fine, predict_rasterEngine bailed on me.
>
> Thanks in advance.
> Tim Howard
>
>
>
>>>>>>>
> Date: Tue, 18 Mar 2014 22:14:23 -0500
> From: Jonathan Greenberg <jgrn at illinois.edu>
> To: "r-sig-geo at r-project.org" <R-sig-Geo at r-project.org>
> Subject: [R-sig-Geo] Parallel predict now in spatial.tools
> Message-ID:
> <CABG0rfseg+p0h4HdYOK+_Za=OLMeTKHAT+TQn7g_FkEdYiunFQ at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> R-sig-geo'ers:
>
> I finally got around to building a parallel predict statement that
> I've included in version 1.3.7 (or later) of spatial.tools (check
> http://r-forge.r-project.org/R/?group_id=1492 for the status of the
> build), "predict_rasterEngine".  It should, in theory, be a direct
> swap-in for the standard generic predict() statement.  Currently, it
> will work on any predict.* statement that has the following features:
> 1) The data is passed to the predict as a data frame via a newdata
> parameter, and
> 2) The data is returned from the predict statement as a vector/matrix.
>
> When using predict_rasterEngine, the object= parameter is your model,
> and the newdata= parameter is the raster/brick/stack to apply the
> model to on a pixel-by-pixel basis (note that the names of the layers
> must match the names of the predictor variables, in most cases).
>
> I was hoping to get some stress-testing on this, since it is a fairly
> oft-requested function.  If a predict.* function you'd like to use
> doesn't work, let me know which function it is (with some test data)
> and I'll see if I can tweak it to work.
>
> Right now, I have confirmed this works with randomForest.  Here's an
> example:
>
> ######################
>
> packages_required <- c("spatial.tools","doParallel","randomForest")
> lapply(packages_required, require, character.only=T)
>
> # Load up a 3-band image:
> tahoe_highrez <- setMinMax(
> brick(system.file("external/tahoe_highrez.tif", package="spatial.tools")))
> tahoe_highrez
> plotRGB(tahoe_highrez)
>
> # Load up some training points:
> tahoe_highrez_training_points <- readOGR(
> dsn=system.file("external", package="spatial.tools"),
> layer="tahoe_highrez_training_points")
>
> # Extract data to train the randomForest model:
> tahoe_highrez_training_extract <- extract(
> tahoe_highrez,
> tahoe_highrez_training_points,
> df=TRUE)
>
> # Fuse it back with the SPECIES info:
> tahoe_highrez_training_extract$SPECIES <-
> tahoe_highrez_training_points$SPECIES
>
> # Note the names of the bands:
> names(tahoe_highrez_training_extract) # the extracted data
> names(tahoe_highrez) # the brick
>
> # Generate a randomForest model:
> tahoe_rf <-
> randomForest(SPECIES~tahoe_highrez.1+tahoe_highrez.2+tahoe_highrez.3,
> data=tahoe_highrez_training_extract)
>
> tahoe_rf
>
> # This will run the predict in parallel:
> sfQuickInit()
> prediction_rf_class <-
> predict_rasterEngine(object=tahoe_rf,newdata=tahoe_highrez,type="response")
> prediction_rf_prob <-
> predict_rasterEngine(object=tahoe_rf,newdata=tahoe_highrez,type="prob")
> sfQuickStop()
>
> ###############
>
> --j



-- 
Jonathan A. Greenberg, PhD
Assistant Professor
Global Environmental Analysis and Remote Sensing (GEARS) Laboratory
Department of Geography and Geographic Information Science
University of Illinois at Urbana-Champaign
259 Computing Applications Building, MC-150
605 East Springfield Avenue
Champaign, IL  61820-6371
Phone: 217-300-1924
http://www.geog.illinois.edu/~jgrn/
AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20140326/08b9e97e/attachment.html>


More information about the R-sig-Geo mailing list