[R] More than doubling performance with snow
Markus Schmidberger
schmidb at ibe.med.uni-muenchen.de
Mon Nov 24 18:40:17 CET 2008
Hi,
there is a new mailing list for R and HPC: r-sig-hpc at r-project.org
This is probably a better list for this question. Do not forget, first
of all you have to register: https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
In this case the communication overhead is the problem. The data /
matrix is to big!
Have a look to the function snow.time to visualize your communication
and calculation time. It is a new function in snow_0.3-4.
( http://www.cs.uiowa.edu/~luke/R/cluster/ )
Best
Markus
Stefan Evert wrote:
>
>> I'm sorry but I don't quite understand what "not running solve() in
>> this process" means. I updated the code and it do show that the result
>> from clusterApply() are identical with the result from lapply(). Could
>> you please explain more about this?
>
> The point is that a parallel processing framework like Snow and PVM does
> not execute the operation in your (interactive) R session, but rather
> starts separate computing processes that carry out the actual
> calculation (while your R session is just waiting for the results to
> become available). These separate processes can either run on different
> computers in a network, or on your local machine (in order to make use
> of multiple CPU cores).
>
>>>> user system elapsed
>>>> 0.584 0.144 4.355
>
>>>> user system elapsed
>>>> 4.777 0.100 4.901
>
>
> If you take a close look at your timing results, you can see that the
> total processing time ("elapsed") is only slightly shorter with
> parallelisation (4.35 s) than without (4.9 s). You've probably been
> looking at "user" time, i.e. the amount of CPU time your interactive R
> session consumed. Since with parallel processing, the R session itself
> doesn't perform the actual calculation (as explained above), it is
> mostly waiting for results to become available and "user" time is
> therefore reduced drastically. In short, when measuring performance
> improvements from parallelisation, always look at the total "elapsed" time.
>
> So why isn't parallel processing twice as fast as performing the
> caculation in a single thread? Perhaps the advantage of using both CPU
> cores was eaten up by the communication overhead. You should also take
> into account that a lot of other processes (terminals, GUI, daemons,
> etc.) are running on your computer at the same time, so even with
> parallel processing you will not have both cores fully available to R.
> In my experience, there is little benefit in parallelisation as long as
> you just have two CPU cores on your computer (rather than, say, 8 cores).
>
> Hope this clarifies things a bit (and is reasonably accurate, since I
> don't have much experience with parallelisation),
> Stefan
>
> [ stefan.evert at uos.de | http://purl.org/stefan.evert ]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Dipl.-Tech. Math. Markus Schmidberger
Ludwig-Maximilians-Universität München
IBE - Institut für medizinische Informationsverarbeitung,
Biometrie und Epidemiologie
Marchioninistr. 15, D-81377 Muenchen
URL: http://www.ibe.med.uni-muenchen.de
Mail: Markus.Schmidberger [at] ibe.med.uni-muenchen.de
Tel: +49 (089) 7095 - 4599
More information about the R-help
mailing list