[R-sig-Geo] loops in rasterEngine

Boulanger, Yan Yan.Boulanger at RNCan-NRCan.gc.ca
Thu Mar 13 17:20:30 CET 2014


Many thanks Jonathan, Alex and Forrest,

This is very helpful information. I'll see what's the best between calc and rasterEngine. 

Sincerely,

Yan

Yan Boulanger, Chercheur scientifique / Research scientist 
Ressources Naturelles Canada, Canadian Forest Service 
Centre de Foresterie des Laurentides 
1055, rue du P.E.P.S.
C.P. 10380, succ. Sainte-Foy
Québec (Québec) Canada
G1V 4C7 
Tel. : +1 418 649-6859 


-----Original Message-----
From: jgrn307 at gmail.com [mailto:jgrn307 at gmail.com] On Behalf Of Jonathan Greenberg
Sent: 13 mars 2014 12:18
To: Alex Zvoleff
Cc: Boulanger, Yan; r-sig-geo at r-project.org
Subject: Re: [R-sig-Geo] loops in rasterEngine

Yan:

Looks like you are getting great help with this -- I want to echo Alex's note that rasterEngine is not a catchall -- for REALLY simple processes you'll get better performance using calc() or using LESS workers (which may seem counter intuitive).  I'm submitting a paper this week that showed that a function that just multiplies a raster by
10 ran faster than calc() only when using 4 workers
(sfQuickInit(cpus=4)) (vs. calc's 1), but was slower than calc if you have less or more workers.  As a rule, rasterEngine, at present, is slower than calc when operation in sequential mode.

Now, as an important note, if you grab the latest spatial.tools from r-forge, I have added a feature that will return multiple rasters at once, which seems like what you want to do.  You'll want to return a list-of-arrays (each component will be written to its own raster) and make sure you specify the output filenames (the components will be matched against the output filenames).  This may result in a significant speedup because you are only reading each raster once, and returning all the outputs (vs. the example above reads/writes the rasters for every i).

--j

On Thu, Mar 13, 2014 at 9:06 AM, Alex Zvoleff <azvoleff at conservation.org> wrote:
> On Wed, Mar 12, 2014 at 11:29 PM, Boulanger, Yan 
> <Yan.Boulanger at rncan-nrcan.gc.ca> wrote:
>> Actually, I have several rasters of more than 440 000 000 pixels 
>> (MODIS covering all Canada) and I have a 32-cores machine so I would 
>> like to take advantage of it! ;-)
>>
>> Time is money (really?!!)
>
> As mentioned earlier, I would be careful about using rasterEngine for 
> this kind of task. It may actually slow you down. I would recommend 
> testing on smaller subsets to determine your gains (or losses) from 
> doing this type of calculation in parallel versus sequentially. While 
> I have seen great speed increases for CPU intensive calculations from 
> using rasterEngine, it sounds like your processing is heavily IO 
> intensive. I am not sure 32 cores will help you unless you have a very 
> fast disk or RAID array.
>
> Alex
>
>>
>> Thanks again!
>> yan
>>
>> Yan Boulanger, Chercheur scientifique / Research scientist Ressources 
>> Naturelles Canada, Canadian Forest Service Centre de Foresterie des 
>> Laurentides 1055, rue du P.E.P.S.
>> C.P. 10380, succ. Sainte-Foy
>> Québec (Québec) Canada
>> G1V 4C7
>> Tel. : +1 418 649-6859
>>
>> From: Forrest Stevens [mailto:forrest at ufl.edu]
>> Sent: 12 mars 2014 22:25
>> To: Boulanger, Yan
>> Cc: r-sig-geo at r-project.org
>> Subject: Re: [R-sig-Geo] loops in rasterEngine
>>
>> Hi Yan, I guess I would be surprised for such a simple process if rasterEngine() would be worth the overhead? Though, admittedly, Jonathan Greenberg might have more information on the topic.  To do such an operation this is the approach I would take without using rasterEngine():
>>
>>
>> for (i in 1:5) {
>>   assign(paste("Safranyik_zones_1961_1990b_",i, sep=""), 
>> Safranyik_zones_1961_1990b == i) }
>>
>>
>> To do it using rasterEngine() this is the function definition that I would use. This of course requires that you've already created a cluster using one of the various supported parallel backends otherwise you'll gain nothing from the parallel processing.
>>
>>
>> require("spatial.tools")
>>
>> ## Begin a parallel cluster and register it with foreach:
>> ## The number of nodes/cores to use in the cluster cpus = 2 cl <- 
>> makeCluster(spec = cpus, type = "PSOCK", methods = FALSE) ## Register 
>> the cluster with foreach:
>> registerDoParallel(cl)
>>
>> ##       Or use the following, quick and dirty way:
>> #sfQuickInit(cpus=2)
>>
>> fun_zone <- function( zones, i, ...) {
>>   return(zones == i)
>> }
>>
>> for (j in 1:5){
>>   assign(paste("Safranyik_zones_1961_1990b_",j, sep=""), 
>> rasterEngine( zones=Safranyik_zones_1961_1990b, args=list("i"=j), 
>> fun=fun_zone) ) }
>>
>> stopCluster(cl)
>> #sfQuickStop()
>>
>>
>> Hope this helps,
>> Forrest
>>
>> --
>> Forrest R. Stevens
>> Ph.D. Candidate, QSE3 IGERT Fellow
>> Department of Geography
>> Land Use and Environmental Change Institute University of Florida 
>> www.clas.ufl.edu/users/forrest<http://www.clas.ufl.edu/users/forrest>
>>
>> On Wed, Mar 12, 2014 at 8:51 PM, Boulanger, Yan <Yan.Boulanger at rncan-nrcan.gc.ca<mailto:Yan.Boulanger at rncan-nrcan.gc.ca>> wrote:
>> Hi folks,
>>
>> I guess I have a lot to learn to write functions but I'm stuck when using rasterEngine. It seems that it should be very easy to do but I'm missing something, apparently... I have a raster, Safranyik_zones_1961_1990, with values (integer) from 1 to 5. I would like to create five rasters for which value will be 1 when the raster Safranyik_zones_1961_1990 is equal to "i", and NA otherwise. I would like to run everything in a loop . Here's what I thought would be ok.
>>
>> fun_zone <- function(Safranyik_zones,i,...) { Safranyik_zonesb <- 
>> Safranyik_zones Safranyik_zonesb[] <- NA 
>> Safranyik_zonesb[Safranyik_zones == i] <- 1
>> return(Safranyik_zonesb)
>> }
>>
>> for (i in 1:5){
>> Safranyik_zones_1961_1990b <- 
>> rasterEngine(Safranyik_zones=Safranyik_zones_1961_1990,i=i, 
>> fun=fun_zone) assign(paste("Safranyik_zones_1961_1990b_",i, 
>> sep=""),Safranyik_zones_1961_1990b[[1]])
>> }
>>
>> Of course, it says that « i » is missing...:
>>
>>>Erreur dans Safranyik_zones == i : 'i' est manquant
>>
>> Any help?
>>
>> Thanks in advance,
>>
>> Yan
>>
>>
>> Yan Boulanger, Chercheur scientifique / Research scientist Ressources 
>> Naturelles Canada, Canadian Forest Service Centre de Foresterie des 
>> Laurentides 1055, rue du P.E.P.S.
>> C.P. 10380, succ. Sainte-Foy
>> Québec (Québec) Canada
>> G1V 4C7
>> Tel. : +1 418 649-6859
>>
>>
>>
>>
>>         [[alternative HTML version deleted]]
>>
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at r-project.org<mailto:R-sig-Geo at r-project.org>
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>>
>>         [[alternative HTML version deleted]]
>>
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>
>
>
> --
> Alex Zvoleff
> Postdoctoral Associate
> Tropical Ecology Assessment and Monitoring (TEAM) Network Conservation 
> International
> 2011 Crystal Dr. Suite 500, Arlington, Virginia 22202, USA
> Tel: +1-703-341-2749, Fax: +1-703-979-0953, Skype: azvoleff 
> http://www.teamnetwork.org | http://www.conservation.org
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo



--
Jonathan A. Greenberg, PhD
Assistant Professor
Global Environmental Analysis and Remote Sensing (GEARS) Laboratory Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign
259 Computing Applications Building, MC-150
605 East Springfield Avenue
Champaign, IL  61820-6371
Phone: 217-300-1924
http://www.geog.illinois.edu/~jgrn/
AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007



More information about the R-sig-Geo mailing list