[R] %dopar% parallel processing experiment

Sat Jul 2 20:46:50 CEST 2011

On 02.07.2011 20:42, ivo welch wrote:
> hi uwe--I did not know what snow was.  from my 1 minute reading, it
> seems like a much more involved setup that is much more flexible after
> the setup cost has been incurred (specifically, allowing use of many
> machines).
>
> the attractiveness of the doMC/foreach framework is its simplicity of
> installation and use.
>
> but if I understand what you are telling me, you are using a different
> parallelization framework, and it shows that my example is completed a
> lot faster using this different parallelization framework.  correct?
> if so, the problem is my use of the doMC framework, not the inherent
> cost of dealing with multiple processes.  is this interpretation
> correct?

Indeed.

Uwe

> regards,
>
> /iaw
>
> ----
> Ivo Welch (ivo.welch at gmail.com)
> http://www.ivo-welch.info/
>
>
> 2011/7/2 Uwe Ligges<ligges at statistik.tu-dortmund.de>:
>>
>>
>> On 02.07.2011 20:04, ivo welch wrote:
>>>
>>> thank you, uwe.  this is a little disappointing.  parallel processing
>>> for embarrassingly simple parallel operations--those needing no
>>> communication---should be feasible if the thread is not always created
>>> and released, but held.  is there light-weight parallel processing
>>> that could facilitate this?
>>
>> Hmmm, now that you asked I checked it myself using snow:
>>
>> On a some years old 2-core AMD64 machine with R-2.13.0 and snow (using SOCK
>> clsuters, i.e. slow communication) I get:
>>
>>
>>
>>> system.time(parSapply(cl, 1:A, function(i) uniroot(minfn, c(1e-20,9e20),
>>> i)))
>>    user  system elapsed
>>    3.10    0.19   51.43
>>
>> while on a single core without parallelization framework:
>>
>>> system.time(sapply(1:A, function(i) uniroot(minfn, c(1e-20,9e20), i)))
>>    user  system elapsed
>>   93.74    0.09   94.24
>>
>> Hence (although my prior assumption was that the overhead would be big also
>> for other frameworks than foreach) it scales perfectly well with snow,
>> perhaps you have to use foreach in a different way?
>>
>> Best,
>> Uwe Ligges
>>
>>
>>
>>
>>
>>>
>>> regards,
>>>
>>> /iaw
>>>
>>>
>>> 2011/7/2 Uwe Ligges<ligges at statistik.tu-dortmund.de>:
>>>>
>>>>
>>>> On 02.07.2011 19:32, ivo welch wrote:
>>>>>
>>>>> dear R experts---
>>>>>
>>>>> I am experimenting with multicore processing, so far with pretty
>>>>> disappointing results.  Here is my simple example:
>>>>>
>>>>> A<- 100000
>>>>> randvalues<- abs(rnorm(A))
>>>>> minfn<- function( x, i ) { log(abs(x))+x^3+i/A+randvalues[i] }  ## an
>>>>> arbitrary function
>>>>>
>>>>> ARGV<- commandArgs(trailingOnly=TRUE)
>>>>>
>>>>> if (ARGV[1] == "do-onecore") {
>>>>>     library(foreach)
>>>>>     discard<- foreach(i = 1:A) %do% uniroot( minfn, c(1e-20,9e20), i ) }
>>>>> else
>>>>> if (ARGV[1] == "do-multicore") {
>>>>>     library(doMC)
>>>>>     registerDoMC()
>>>>>     cat("You have", getDoParWorkers(), "cores\n")
>>>>>     discard<- foreach(i = 1:A) %dopar% uniroot( minfn, c(1e-20,9e20), i )
>>>>> }
>>>>> else
>>>>> if (ARGV[1] == "plain")
>>>>>     for (i in 1:A) discard<- uniroot( minfn, c(1e-20,9e20), i ) else
>>>>> cat("sorry, but argument", ARGV[1], "is not
>>>>> plain|do-onecore|do-multicore\n")
>>>>>
>>>>>
>>>>> on my Mac Pro 3,1 (2 quad-cores), R 2.12.0, which reports 8 cores,
>>>>>
>>>>>    "plain" takes about 68 seconds (real and user, using the unix timing
>>>>> function).
>>>>>    "do-onecore" takes about 300 seconds.
>>>>>    "do-multicore" takes about 210 seconds real, (300 seconds user).
>>>>>
>>>>> this seems pretty disappointing.  the cores are not used for the most
>>>>> part, either.  feedback appreciated.
>>>>
>>>>
>>>> Feedback is that a single computation within your foreach loop is so
>>>> quick
>>>> that the overhead of communicating data and results between processes
>>>> costs
>>>> more time than the actual evaluation, hence you are faster with a single
>>>> process.
>>>>
>>>> What you should do is:
>>>>
>>>> write code that does, e.g., 10000 iterations within 10 other iterations
>>>> and
>>>> just do a foreach loop around the outer 10. Then you will probably be
>>>> much
>>>> faster (without testing). But this is essentially the example I am using
>>>> for
>>>> teaching to show when not to do parallel processing.....
>>>>
>>>> Best,
>>>> Uwe Ligges
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> /iaw
>>>>>
>>>>>
>>>>> ----
>>>>> Ivo Welch (ivo.welch at gmail.com)
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.