[R] %dopar% parallel processing experiment
Uwe Ligges
ligges at statistik.tu-dortmund.de
Sat Jul 2 20:46:50 CEST 2011
On 02.07.2011 20:42, ivo welch wrote:
> hi uwe--I did not know what snow was. from my 1 minute reading, it
> seems like a much more involved setup that is much more flexible after
> the setup cost has been incurred (specifically, allowing use of many
> machines).
>
> the attractiveness of the doMC/foreach framework is its simplicity of
> installation and use.
>
> but if I understand what you are telling me, you are using a different
> parallelization framework, and it shows that my example is completed a
> lot faster using this different parallelization framework. correct?
> if so, the problem is my use of the doMC framework, not the inherent
> cost of dealing with multiple processes. is this interpretation
> correct?
Indeed.
Uwe
> regards,
>
> /iaw
>
> ----
> Ivo Welch (ivo.welch at gmail.com)
> http://www.ivo-welch.info/
>
>
> 2011/7/2 Uwe Ligges<ligges at statistik.tu-dortmund.de>:
>>
>>
>> On 02.07.2011 20:04, ivo welch wrote:
>>>
>>> thank you, uwe. this is a little disappointing. parallel processing
>>> for embarrassingly simple parallel operations--those needing no
>>> communication---should be feasible if the thread is not always created
>>> and released, but held. is there light-weight parallel processing
>>> that could facilitate this?
>>
>> Hmmm, now that you asked I checked it myself using snow:
>>
>> On a some years old 2-core AMD64 machine with R-2.13.0 and snow (using SOCK
>> clsuters, i.e. slow communication) I get:
>>
>>
>>
>>> system.time(parSapply(cl, 1:A, function(i) uniroot(minfn, c(1e-20,9e20),
>>> i)))
>> user system elapsed
>> 3.10 0.19 51.43
>>
>> while on a single core without parallelization framework:
>>
>>> system.time(sapply(1:A, function(i) uniroot(minfn, c(1e-20,9e20), i)))
>> user system elapsed
>> 93.74 0.09 94.24
>>
>> Hence (although my prior assumption was that the overhead would be big also
>> for other frameworks than foreach) it scales perfectly well with snow,
>> perhaps you have to use foreach in a different way?
>>
>> Best,
>> Uwe Ligges
>>
>>
>>
>>
>>
>>>
>>> regards,
>>>
>>> /iaw
>>>
>>>
>>> 2011/7/2 Uwe Ligges<ligges at statistik.tu-dortmund.de>:
>>>>
>>>>
>>>> On 02.07.2011 19:32, ivo welch wrote:
>>>>>
>>>>> dear R experts---
>>>>>
>>>>> I am experimenting with multicore processing, so far with pretty
>>>>> disappointing results. Here is my simple example:
>>>>>
>>>>> A<- 100000
>>>>> randvalues<- abs(rnorm(A))
>>>>> minfn<- function( x, i ) { log(abs(x))+x^3+i/A+randvalues[i] } ## an
>>>>> arbitrary function
>>>>>
>>>>> ARGV<- commandArgs(trailingOnly=TRUE)
>>>>>
>>>>> if (ARGV[1] == "do-onecore") {
>>>>> library(foreach)
>>>>> discard<- foreach(i = 1:A) %do% uniroot( minfn, c(1e-20,9e20), i ) }
>>>>> else
>>>>> if (ARGV[1] == "do-multicore") {
>>>>> library(doMC)
>>>>> registerDoMC()
>>>>> cat("You have", getDoParWorkers(), "cores\n")
>>>>> discard<- foreach(i = 1:A) %dopar% uniroot( minfn, c(1e-20,9e20), i )
>>>>> }
>>>>> else
>>>>> if (ARGV[1] == "plain")
>>>>> for (i in 1:A) discard<- uniroot( minfn, c(1e-20,9e20), i ) else
>>>>> cat("sorry, but argument", ARGV[1], "is not
>>>>> plain|do-onecore|do-multicore\n")
>>>>>
>>>>>
>>>>> on my Mac Pro 3,1 (2 quad-cores), R 2.12.0, which reports 8 cores,
>>>>>
>>>>> "plain" takes about 68 seconds (real and user, using the unix timing
>>>>> function).
>>>>> "do-onecore" takes about 300 seconds.
>>>>> "do-multicore" takes about 210 seconds real, (300 seconds user).
>>>>>
>>>>> this seems pretty disappointing. the cores are not used for the most
>>>>> part, either. feedback appreciated.
>>>>
>>>>
>>>> Feedback is that a single computation within your foreach loop is so
>>>> quick
>>>> that the overhead of communicating data and results between processes
>>>> costs
>>>> more time than the actual evaluation, hence you are faster with a single
>>>> process.
>>>>
>>>> What you should do is:
>>>>
>>>> write code that does, e.g., 10000 iterations within 10 other iterations
>>>> and
>>>> just do a foreach loop around the outer 10. Then you will probably be
>>>> much
>>>> faster (without testing). But this is essentially the example I am using
>>>> for
>>>> teaching to show when not to do parallel processing.....
>>>>
>>>> Best,
>>>> Uwe Ligges
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> /iaw
>>>>>
>>>>>
>>>>> ----
>>>>> Ivo Welch (ivo.welch at gmail.com)
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list