[R] How to utilise dual cores and multi-processors on WinXP

Martin Morgan mtmorgan at fhcrc.org
Tue Mar 6 19:07:18 CET 2007


rhelp.20.trevva at spamgourmet.com writes:

> Hello,
>
> I have a question that I was wondering if anyone had a fairly
> straightforward answer to: what is the quickest and easiest way to
> take advantage of the extra cores / processors that are now
> commonplace on modern machines? And how do I do that in Windows?

> I realise that this is a complex question that is not answered easily,
> so let me refine it some more. The type of scripts that I'm dealing
> with are well suited to parallelisation - often they involve mapping
> out parameter space by changing a single parameter and then re-running
> the simulation 10 (or n times), and then brining all the results back
> to gether at the end for analysis. If I can distribute the runs over
> all the processors available in my machine, I'm going to roughly halve
> the run speed. The question is, how to do this?
>
> I've looked at many of the packages in this area: rmpi, snow, snowFT,
> rpvm, and taskPR - these all seem to have the functionality that I
> want, but don't exist for windows. The best solution is to switch to
> Linux, but unfortunately that's not an option.

Rmpi runs on windows (see http://www.stats.uwo.ca/faculty/yu/Rmpi/).

You'll end up modifying your code, probably using one of the many
parLapply-like functions (from Rmpi; comparable functions in snow and
the package papply) to do 'lapply' but spread over the different
compute processors. This is likely to require some thought, as for
instance the data transmission costs can overwhelm any speedup and the
FUN argument to the lapply-like functions should probably reference
only local variables. The classic first attempt performs the
equivalent of 1000 bootstraps on each node, rather than dividing the
1000 replicates amongst nodes (which is actually quite hard to do).

In principle I think you might also be able to use a parallelized
LAPACK, following the general instruction of the R Installation and
Administration guide. I have not done this. It would likely represent
a challenge, and would benefit (perhaps) the code that uses the LAPACK
linear algebra routines.

> Another option is to divide the task in half from the beginning, spawn
> two "slave" instances of R (e.g. via Rcmd), let them run, and then
> collate the results at the end. But how exactly to do this and how to
> know when they're done?

The Bioconductor package Biobase has a function Aggregate that might
be fun to explore; I don't think it receives much use.

> Can anyone recommend a nice solution? I'm sure that I'm not the only
> one who'd love to double their computational speed...
>
> Cheers,
>
> Mark
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
> posting guide http://www.R-project.org/posting-guide.html and provide
> commented, minimal, self-contained, reproducible code.

-- 
Martin Morgan
Bioconductor / Computational Biology
http://bioconductor.org



More information about the R-help mailing list