[R-sig-hpc] How to check if %dopar% really run parallel?

Cedrick W. Johnson cedrick at cedrickjohnson.com
Tue May 4 16:35:08 CEST 2010


Definitely works on an 8 core (new shiny toy, oh my!)... I have a return 
series for about 40 instruments dating back to 2000. Before getting 
snow/foreach/dopar to work, I previously would run the command:

chart.VaRSensitivity(R, methods=c("HistoricalVaR", "ModifiedVaR", 
"GaussianVaR"), clean="geltner", colorset=bluefocus, lwd=2)

This took a bit of time to go through all the instruments and generate a 
VaR sensitivity graph

This code, sped that process up significantly on a single 8 core 
machine, to less than 30 seconds by my estimate, code:

#define the parallelization function
run.sens <- function(R) {
         library(PerformanceAnalytics)
         png(file=paste("VAR-Sens-",R,".png", sep=""), width=500, 
height=500)
         chart.VaRSensitivity(R, methods=c("HistoricalVaR", 
"ModifiedVaR", "GaussianVaR"), clean="geltner", colorset=bluefocus, lwd=2)
         dev.off()
     }
#let’s do it, using the instrument returns
foreach(R=MyGlobalInstruments.returns) %dopar% run.sens(R)

As some have mentioned, be careful what you choose to parallelize. This 
particular example does *not* work well across networked clusters due to 
the fact that I'm creating a .png file for each instrument. It *does* 
however make sense to run it across the full 8 cores available to me (or 
X cores is fine, I do the same routine on a 4 core Linux box) on the 
local machine.

  	User	System	Elapsed
Before	722.03	1.57	763.18
After	0.04	0.25	572.01

HTH,
cedrick




On 5/4/2010 9:31 AM, Mario Valle wrote:
> *BIG RED FACE*
> I'm ashamed of myself, that's was the error!
> A small, stupid pair of parenthesis missing.
> Now the parallel version is faster than the serial one as it should.
> (serial: 57.41, parallel 2 cores: 39.31)
> Thanks to Stephen and all.
>
> 				mario
>
> Stephen Weston wrote:
>> There is a mistake.  Rather than:
>>
>>      times(10000) %dopar% fun
>>
>> you should write:
>>
>>      times(10000) %dopar% fun()
>>
>> On my machine, "fun" executes in about 0.4 seconds, so executing
>> it 10,000 times should take over an hour to execute.  Your error turned
>> a real program into a toy program.  The error also resulted in more
>> communication, since now the function itself is being returned by the
>> workers.
>>
>> When I ran your benchmark on my machine with 100, rather than 10,000
>> tasks, I got the following results:
>>
>>     user  system elapsed
>>   43.573   0.191  43.823
>>     user  system elapsed
>>    0.093   0.007  24.890
>>
>> That's not so bad.
>>
>> - Steve
>>
>>
>> On Tue, May 4, 2010 at 12:22 AM, Mario Valle<mvalle at cscs.ch>  wrote:
>>> Is there any way to check that %dopar% really runs parallel?
>>> The following code (on a dual core laptop running windows+R 2.11.0pat and on
>>> Linux+R2.11.0) runs %dopar% more slowly than the same %do% code.
>>> BTW, if you see any obvious mistake in the code...
>>> Thanks!
>>>                 mario
>>>
>>>
>>> library(doSNOW)
>>> library(foreach)
>>>
>>> fun<- function() for(q in 1:1000000) sqrt(3)
>>>
>>> system.time(times(10000) %do% fun, gcFirst = TRUE)
>>> #   user  system elapsed
>>> #   5.74    0.01    6.24
>>>
>>> cl<- makeCluster(2, type = "SOCK")
>>> registerDoSNOW(cl)
>>>
>>> system.time(times(10000) %dopar% fun, gcFirst = TRUE)
>>> #   user  system elapsed
>>> #   7.89    0.19    9.01
>>>
>>> stopCluster(cl)
>>>
>>> --
>>> Ing. Mario Valle
>>> Data Analysis and Visualization Group            |
>>> http://www.cscs.ch/~mvalle
>>> Swiss National Supercomputing Centre (CSCS)      | Tel:  +41 (91) 610.82.60
>>> v. Cantonale Galleria 2, 6928 Manno, Switzerland | Fax:  +41 (91) 610.82.82
>>>
>>> _______________________________________________
>>> R-sig-hpc mailing list
>>> R-sig-hpc at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>>>
>



More information about the R-sig-hpc mailing list