[R-sig-hpc] How to check if %dopar% really run parallel?
Cedrick W. Johnson
cedrick at cedrickjohnson.com
Tue May 4 16:35:08 CEST 2010
Definitely works on an 8 core (new shiny toy, oh my!)... I have a return
series for about 40 instruments dating back to 2000. Before getting
snow/foreach/dopar to work, I previously would run the command:
chart.VaRSensitivity(R, methods=c("HistoricalVaR", "ModifiedVaR",
"GaussianVaR"), clean="geltner", colorset=bluefocus, lwd=2)
This took a bit of time to go through all the instruments and generate a
VaR sensitivity graph
This code, sped that process up significantly on a single 8 core
machine, to less than 30 seconds by my estimate, code:
#define the parallelization function
run.sens <- function(R) {
library(PerformanceAnalytics)
png(file=paste("VAR-Sens-",R,".png", sep=""), width=500,
height=500)
chart.VaRSensitivity(R, methods=c("HistoricalVaR",
"ModifiedVaR", "GaussianVaR"), clean="geltner", colorset=bluefocus, lwd=2)
dev.off()
}
#let’s do it, using the instrument returns
foreach(R=MyGlobalInstruments.returns) %dopar% run.sens(R)
As some have mentioned, be careful what you choose to parallelize. This
particular example does *not* work well across networked clusters due to
the fact that I'm creating a .png file for each instrument. It *does*
however make sense to run it across the full 8 cores available to me (or
X cores is fine, I do the same routine on a 4 core Linux box) on the
local machine.
User System Elapsed
Before 722.03 1.57 763.18
After 0.04 0.25 572.01
HTH,
cedrick
On 5/4/2010 9:31 AM, Mario Valle wrote:
> *BIG RED FACE*
> I'm ashamed of myself, that's was the error!
> A small, stupid pair of parenthesis missing.
> Now the parallel version is faster than the serial one as it should.
> (serial: 57.41, parallel 2 cores: 39.31)
> Thanks to Stephen and all.
>
> mario
>
> Stephen Weston wrote:
>> There is a mistake. Rather than:
>>
>> times(10000) %dopar% fun
>>
>> you should write:
>>
>> times(10000) %dopar% fun()
>>
>> On my machine, "fun" executes in about 0.4 seconds, so executing
>> it 10,000 times should take over an hour to execute. Your error turned
>> a real program into a toy program. The error also resulted in more
>> communication, since now the function itself is being returned by the
>> workers.
>>
>> When I ran your benchmark on my machine with 100, rather than 10,000
>> tasks, I got the following results:
>>
>> user system elapsed
>> 43.573 0.191 43.823
>> user system elapsed
>> 0.093 0.007 24.890
>>
>> That's not so bad.
>>
>> - Steve
>>
>>
>> On Tue, May 4, 2010 at 12:22 AM, Mario Valle<mvalle at cscs.ch> wrote:
>>> Is there any way to check that %dopar% really runs parallel?
>>> The following code (on a dual core laptop running windows+R 2.11.0pat and on
>>> Linux+R2.11.0) runs %dopar% more slowly than the same %do% code.
>>> BTW, if you see any obvious mistake in the code...
>>> Thanks!
>>> mario
>>>
>>>
>>> library(doSNOW)
>>> library(foreach)
>>>
>>> fun<- function() for(q in 1:1000000) sqrt(3)
>>>
>>> system.time(times(10000) %do% fun, gcFirst = TRUE)
>>> # user system elapsed
>>> # 5.74 0.01 6.24
>>>
>>> cl<- makeCluster(2, type = "SOCK")
>>> registerDoSNOW(cl)
>>>
>>> system.time(times(10000) %dopar% fun, gcFirst = TRUE)
>>> # user system elapsed
>>> # 7.89 0.19 9.01
>>>
>>> stopCluster(cl)
>>>
>>> --
>>> Ing. Mario Valle
>>> Data Analysis and Visualization Group |
>>> http://www.cscs.ch/~mvalle
>>> Swiss National Supercomputing Centre (CSCS) | Tel: +41 (91) 610.82.60
>>> v. Cantonale Galleria 2, 6928 Manno, Switzerland | Fax: +41 (91) 610.82.82
>>>
>>> _______________________________________________
>>> R-sig-hpc mailing list
>>> R-sig-hpc at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>>>
>
More information about the R-sig-hpc
mailing list