[R-sig-Geo] loops in rasterEngine

Jonathan Greenberg jgrn at illinois.edu
Thu Mar 13 17:17:39 CET 2014


Yan:

Looks like you are getting great help with this -- I want to echo
Alex's note that rasterEngine is not a catchall -- for REALLY simple
processes you'll get better performance using calc() or using LESS
workers (which may seem counter intuitive).  I'm submitting a paper
this week that showed that a function that just multiplies a raster by
10 ran faster than calc() only when using 4 workers
(sfQuickInit(cpus=4)) (vs. calc's 1), but was slower than calc if you
have less or more workers.  As a rule, rasterEngine, at present, is
slower than calc when operation in sequential mode.

Now, as an important note, if you grab the latest spatial.tools from
r-forge, I have added a feature that will return multiple rasters at
once, which seems like what you want to do.  You'll want to return a
list-of-arrays (each component will be written to its own raster) and
make sure you specify the output filenames (the components will be
matched against the output filenames).  This may result in a
significant speedup because you are only reading each raster once, and
returning all the outputs (vs. the example above reads/writes the
rasters for every i).

--j

On Thu, Mar 13, 2014 at 9:06 AM, Alex Zvoleff <azvoleff at conservation.org> wrote:
> On Wed, Mar 12, 2014 at 11:29 PM, Boulanger, Yan
> <Yan.Boulanger at rncan-nrcan.gc.ca> wrote:
>> Actually, I have several rasters of more than 440 000 000 pixels (MODIS covering all Canada) and I have a 32-cores machine so I would like to take advantage of it! ;-)
>>
>> Time is money (really?!!)
>
> As mentioned earlier, I would be careful about using rasterEngine for
> this kind of task. It may actually slow you down. I would recommend
> testing on smaller subsets to determine your gains (or losses) from
> doing this type of calculation in parallel versus sequentially. While
> I have seen great speed increases for CPU intensive calculations from
> using rasterEngine, it sounds like your processing is heavily IO
> intensive. I am not sure 32 cores will help you unless you have a very
> fast disk or RAID array.
>
> Alex
>
>>
>> Thanks again!
>> yan
>>
>> Yan Boulanger, Chercheur scientifique / Research scientist
>> Ressources Naturelles Canada, Canadian Forest Service
>> Centre de Foresterie des Laurentides
>> 1055, rue du P.E.P.S.
>> C.P. 10380, succ. Sainte-Foy
>> Québec (Québec) Canada
>> G1V 4C7
>> Tel. : +1 418 649-6859
>>
>> From: Forrest Stevens [mailto:forrest at ufl.edu]
>> Sent: 12 mars 2014 22:25
>> To: Boulanger, Yan
>> Cc: r-sig-geo at r-project.org
>> Subject: Re: [R-sig-Geo] loops in rasterEngine
>>
>> Hi Yan, I guess I would be surprised for such a simple process if rasterEngine() would be worth the overhead? Though, admittedly, Jonathan Greenberg might have more information on the topic.  To do such an operation this is the approach I would take without using rasterEngine():
>>
>>
>> for (i in 1:5) {
>>   assign(paste("Safranyik_zones_1961_1990b_",i, sep=""), Safranyik_zones_1961_1990b == i)
>> }
>>
>>
>> To do it using rasterEngine() this is the function definition that I would use. This of course requires that you've already created a cluster using one of the various supported parallel backends otherwise you'll gain nothing from the parallel processing.
>>
>>
>> require("spatial.tools")
>>
>> ## Begin a parallel cluster and register it with foreach:
>> ## The number of nodes/cores to use in the cluster
>> cpus = 2
>> cl <- makeCluster(spec = cpus, type = "PSOCK", methods = FALSE)
>> ## Register the cluster with foreach:
>> registerDoParallel(cl)
>>
>> ##       Or use the following, quick and dirty way:
>> #sfQuickInit(cpus=2)
>>
>> fun_zone <- function( zones, i, ...) {
>>   return(zones == i)
>> }
>>
>> for (j in 1:5){
>>   assign(paste("Safranyik_zones_1961_1990b_",j, sep=""), rasterEngine( zones=Safranyik_zones_1961_1990b, args=list("i"=j), fun=fun_zone) )
>> }
>>
>> stopCluster(cl)
>> #sfQuickStop()
>>
>>
>> Hope this helps,
>> Forrest
>>
>> --
>> Forrest R. Stevens
>> Ph.D. Candidate, QSE3 IGERT Fellow
>> Department of Geography
>> Land Use and Environmental Change Institute
>> University of Florida
>> www.clas.ufl.edu/users/forrest<http://www.clas.ufl.edu/users/forrest>
>>
>> On Wed, Mar 12, 2014 at 8:51 PM, Boulanger, Yan <Yan.Boulanger at rncan-nrcan.gc.ca<mailto:Yan.Boulanger at rncan-nrcan.gc.ca>> wrote:
>> Hi folks,
>>
>> I guess I have a lot to learn to write functions but I'm stuck when using rasterEngine. It seems that it should be very easy to do but I'm missing something, apparently... I have a raster, Safranyik_zones_1961_1990, with values (integer) from 1 to 5. I would like to create five rasters for which value will be 1 when the raster Safranyik_zones_1961_1990 is equal to "i", and NA otherwise. I would like to run everything in a loop . Here's what I thought would be ok.
>>
>> fun_zone <- function(Safranyik_zones,i,...) {
>> Safranyik_zonesb <- Safranyik_zones
>> Safranyik_zonesb[] <- NA
>> Safranyik_zonesb[Safranyik_zones == i] <- 1
>> return(Safranyik_zonesb)
>> }
>>
>> for (i in 1:5){
>> Safranyik_zones_1961_1990b <- rasterEngine(Safranyik_zones=Safranyik_zones_1961_1990,i=i, fun=fun_zone)
>> assign(paste("Safranyik_zones_1961_1990b_",i, sep=""),Safranyik_zones_1961_1990b[[1]])
>> }
>>
>> Of course, it says that « i » is missing...:
>>
>>>Erreur dans Safranyik_zones == i : 'i' est manquant
>>
>> Any help?
>>
>> Thanks in advance,
>>
>> Yan
>>
>>
>> Yan Boulanger, Chercheur scientifique / Research scientist
>> Ressources Naturelles Canada, Canadian Forest Service
>> Centre de Foresterie des Laurentides
>> 1055, rue du P.E.P.S.
>> C.P. 10380, succ. Sainte-Foy
>> Québec (Québec) Canada
>> G1V 4C7
>> Tel. : +1 418 649-6859
>>
>>
>>
>>
>>         [[alternative HTML version deleted]]
>>
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at r-project.org<mailto:R-sig-Geo at r-project.org>
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>>
>>         [[alternative HTML version deleted]]
>>
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>
>
>
> --
> Alex Zvoleff
> Postdoctoral Associate
> Tropical Ecology Assessment and Monitoring (TEAM) Network
> Conservation International
> 2011 Crystal Dr. Suite 500, Arlington, Virginia 22202, USA
> Tel: +1-703-341-2749, Fax: +1-703-979-0953, Skype: azvoleff
> http://www.teamnetwork.org | http://www.conservation.org
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo



-- 
Jonathan A. Greenberg, PhD
Assistant Professor
Global Environmental Analysis and Remote Sensing (GEARS) Laboratory
Department of Geography and Geographic Information Science
University of Illinois at Urbana-Champaign
259 Computing Applications Building, MC-150
605 East Springfield Avenue
Champaign, IL  61820-6371
Phone: 217-300-1924
http://www.geog.illinois.edu/~jgrn/
AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007



More information about the R-sig-Geo mailing list