[R] %dopar% parallel processing experiment

Sat Jul 2 20:24:08 CEST 2011

On 02.07.2011 20:04, ivo welch wrote:
> thank you, uwe.  this is a little disappointing.  parallel processing
> for embarrassingly simple parallel operations--those needing no
> communication---should be feasible if the thread is not always created
> and released, but held.  is there light-weight parallel processing
> that could facilitate this?

Hmmm, now that you asked I checked it myself using snow:

On a some years old 2-core AMD64 machine with R-2.13.0 and snow (using 
SOCK clsuters, i.e. slow communication) I get:

 > system.time(parSapply(cl, 1:A, function(i) uniroot(minfn, 
c(1e-20,9e20), i)))
    user  system elapsed
    3.10    0.19   51.43

while on a single core without parallelization framework:

 > system.time(sapply(1:A, function(i) uniroot(minfn, c(1e-20,9e20), i)))
    user  system elapsed
   93.74    0.09   94.24

Hence (although my prior assumption was that the overhead would be big 
also for other frameworks than foreach) it scales perfectly well with 
snow, perhaps you have to use foreach in a different way?

Best,
Uwe Ligges

>
> regards,
>
> /iaw
>
>
> 2011/7/2 Uwe Ligges<ligges at statistik.tu-dortmund.de>:
>>
>>
>> On 02.07.2011 19:32, ivo welch wrote:
>>>
>>> dear R experts---
>>>
>>> I am experimenting with multicore processing, so far with pretty
>>> disappointing results.  Here is my simple example:
>>>
>>> A<- 100000
>>> randvalues<- abs(rnorm(A))
>>> minfn<- function( x, i ) { log(abs(x))+x^3+i/A+randvalues[i] }  ## an
>>> arbitrary function
>>>
>>> ARGV<- commandArgs(trailingOnly=TRUE)
>>>
>>> if (ARGV[1] == "do-onecore") {
>>>     library(foreach)
>>>     discard<- foreach(i = 1:A) %do% uniroot( minfn, c(1e-20,9e20), i ) }
>>> else
>>> if (ARGV[1] == "do-multicore") {
>>>     library(doMC)
>>>     registerDoMC()
>>>     cat("You have", getDoParWorkers(), "cores\n")
>>>     discard<- foreach(i = 1:A) %dopar% uniroot( minfn, c(1e-20,9e20), i ) }
>>> else
>>> if (ARGV[1] == "plain")
>>>     for (i in 1:A) discard<- uniroot( minfn, c(1e-20,9e20), i ) else
>>> cat("sorry, but argument", ARGV[1], "is not
>>> plain|do-onecore|do-multicore\n")
>>>
>>>
>>> on my Mac Pro 3,1 (2 quad-cores), R 2.12.0, which reports 8 cores,
>>>
>>>    "plain" takes about 68 seconds (real and user, using the unix timing
>>> function).
>>>    "do-onecore" takes about 300 seconds.
>>>    "do-multicore" takes about 210 seconds real, (300 seconds user).
>>>
>>> this seems pretty disappointing.  the cores are not used for the most
>>> part, either.  feedback appreciated.
>>
>>
>> Feedback is that a single computation within your foreach loop is so quick
>> that the overhead of communicating data and results between processes costs
>> more time than the actual evaluation, hence you are faster with a single
>> process.
>>
>> What you should do is:
>>
>> write code that does, e.g., 10000 iterations within 10 other iterations and
>> just do a foreach loop around the outer 10. Then you will probably be much
>> faster (without testing). But this is essentially the example I am using for
>> teaching to show when not to do parallel processing.....
>>
>> Best,
>> Uwe Ligges
>>
>>
>>
>>
>>
>>
>>> /iaw
>>>
>>>
>>> ----
>>> Ivo Welch (ivo.welch at gmail.com)
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>