[R-sig-Geo] Parallel predict now in spatial.tools

Jonathan Greenberg jgrn at illinois.edu
Wed Mar 26 15:54:52 CET 2014


Tim:

One thing to make sure is if the stack has the bands named properly
(they need to match the formula), e.g.:
names(mystack) <- c("b1","b2",...)

If you can also pass along the error you are getting, that will go a
long way towards figuring out the issue!  Cheers!

--j

On Wed, Mar 26, 2014 at 9:48 AM, Tim Howard <tghoward at gw.dec.state.ny.us> wrote:
> Jonathan,
> Thanks for your quick reply. I just tried your Tahoe example using x and y
> in the call rather than a formula and it worked fine, so I was barking up
> the wrong tree. Sorry. I'll try to delve deeper and put together a subset
> example for you to check out if I can't get anywhere.
>
> Thanks,
> Tim
>
>>>> Jonathan Greenberg <jgrn at illinois.edu> 3/26/2014 10:23 AM >>>
>
> Hi Tim:
>
> Re: stack, yep, it should work.  In general, you get a decreased
> performance from stacks since you are having to read from multiple
> files rather than a single one, but it will still benefit from
> parallel processing.
>
> Re: formula -- the best way for me to test this is to crop out a piece
> of your image and send me the random forest model (use ?save), and the
> call you were using.  In theory you should be able to use anything you
> would normally use on a data frame, but I'd have to play with it to
> confirm!  If you can set up those on e.g. google drive I can test it
> out.  Cheers!
>
> --j
>
> On Wed, Mar 26, 2014 at 6:49 AM, Tim Howard <tghoward at gw.dec.state.ny.us>
> wrote:
>> Jonathan,
>> Thank you for putting this together and for the example. I'm doing two
>> things differently with randomForest ... I think perhaps one of them the
>> function isn't handling.
>>
>> First, based on recommendations from Andy Liaw (and ?randomForest), I
>> don't
>> use the formula interface but use x=<many columns>, y=<a column> in the
>> call. Does predict_rasterEngine handle the absence of a formula in the
>> object?
>>
>> Second, I have many large rasters I want to run the predict on, so making
>> a
>> brick would be difficult. I use a rasterStack instead. Does your example
>> work with a rasterStack?
>>
>> I can dive deeper if any of this isn't clear or if these two tweaks work
>> just fine for you. I was just trying to swap out this version of predict
>> with another parallel version to evaluate speed and, while the alternate
>> version works fine, predict_rasterEngine bailed on me.
>>
>> Thanks in advance.
>> Tim Howard
>>
>>
>>
>>>>>>>>
>> Date: Tue, 18 Mar 2014 22:14:23 -0500
>> From: Jonathan Greenberg <jgrn at illinois.edu>
>> To: "r-sig-geo at r-project.org" <R-sig-Geo at r-project.org>
>> Subject: [R-sig-Geo] Parallel predict now in spatial.tools
>> Message-ID:
>> <CABG0rfseg+p0h4HdYOK+_Za=OLMeTKHAT+TQn7g_FkEdYiunFQ at mail.gmail.com>
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> R-sig-geo'ers:
>>
>> I finally got around to building a parallel predict statement that
>> I've included in version 1.3.7 (or later) of spatial.tools (check
>> http://r-forge.r-project.org/R/?group_id=1492 for the status of the
>> build), "predict_rasterEngine".  It should, in theory, be a direct
>> swap-in for the standard generic predict() statement.  Currently, it
>> will work on any predict.* statement that has the following features:
>> 1) The data is passed to the predict as a data frame via a newdata
>> parameter, and
>> 2) The data is returned from the predict statement as a vector/matrix.
>>
>> When using predict_rasterEngine, the object= parameter is your model,
>> and the newdata= parameter is the raster/brick/stack to apply the
>> model to on a pixel-by-pixel basis (note that the names of the layers
>> must match the names of the predictor variables, in most cases).
>>
>> I was hoping to get some stress-testing on this, since it is a fairly
>> oft-requested function.  If a predict.* function you'd like to use
>> doesn't work, let me know which function it is (with some test data)
>> and I'll see if I can tweak it to work.
>>
>> Right now, I have confirmed this works with randomForest.  Here's an
>> example:
>>
>> ######################
>>
>> packages_required <- c("spatial.tools","doParallel","randomForest")
>> lapply(packages_required, require, character.only=T)
>>
>> # Load up a 3-band image:
>> tahoe_highrez <- setMinMax(
>> brick(system.file("external/tahoe_highrez.tif", package="spatial.tools")))
>> tahoe_highrez
>> plotRGB(tahoe_highrez)
>>
>> # Load up some training points:
>> tahoe_highrez_training_points <- readOGR(
>> dsn=system.file("external", package="spatial.tools"),
>> layer="tahoe_highrez_training_points")
>>
>> # Extract data to train the randomForest model:
>> tahoe_highrez_training_extract <- extract(
>> tahoe_highrez,
>> tahoe_highrez_training_points,
>> df=TRUE)
>>
>> # Fuse it back with the SPECIES info:
>> tahoe_highrez_training_extract$SPECIES <-
>> tahoe_highrez_training_points$SPECIES
>>
>> # Note the names of the bands:
>> names(tahoe_highrez_training_extract) # the extracted data
>> names(tahoe_highrez) # the brick
>>
>> # Generate a randomForest model:
>> tahoe_rf <-
>> randomForest(SPECIES~tahoe_highrez.1+tahoe_highrez.2+tahoe_highrez.3,
>> data=tahoe_highrez_training_extract)
>>
>> tahoe_rf
>>
>> # This will run the predict in parallel:
>> sfQuickInit()
>> prediction_rf_class <-
>>
>> predict_rasterEngine(object=tahoe_rf,newdata=tahoe_highrez,type="response")
>> prediction_rf_prob <-
>> predict_rasterEngine(object=tahoe_rf,newdata=tahoe_highrez,type="prob")
>> sfQuickStop()
>>
>> ###############
>>
>> --j
>
>
>
> --
> Jonathan A. Greenberg, PhD
> Assistant Professor
> Global Environmental Analysis and Remote Sensing (GEARS) Laboratory
> Department of Geography and Geographic Information Science
> University of Illinois at Urbana-Champaign
> 259 Computing Applications Building, MC-150
> 605 East Springfield Avenue
> Champaign, IL  61820-6371
> Phone: 217-300-1924
> http://www.geog.illinois.edu/~jgrn/
> AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007



-- 
Jonathan A. Greenberg, PhD
Assistant Professor
Global Environmental Analysis and Remote Sensing (GEARS) Laboratory
Department of Geography and Geographic Information Science
University of Illinois at Urbana-Champaign
259 Computing Applications Building, MC-150
605 East Springfield Avenue
Champaign, IL  61820-6371
Phone: 217-300-1924
http://www.geog.illinois.edu/~jgrn/
AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007



More information about the R-sig-Geo mailing list