[Bioc-devel] BiocParallel

Ryan C. Thompson rct at thompsonclan.org
Fri Nov 16 20:53:36 CET 2012


To be more specific, instead of:

library(parallel)
cl <- ... # Make a cluster
parLapply(cl, X, fun, ...)

you can do:

library(parallel)
library(doParallel)
library(plyr)
cl <- ...
registerDoParallel(cl)
llply(X, fun, ..., .parallel=TRUE)

On Fri 16 Nov 2012 11:44:06 AM PST, Ryan C. Thompson wrote:
> You don't have to use foreach directly. I use foreach almost
> exclusively through the plyr package, which uses foreach internally to
> implement parallelism. Like you, I'm not particularly fond of the
> foreach syntax (though it has some nice features that come in handy
> sometimes).
>
> The appeal of foreach is that it supports pluggable parallelizing
> backends, so you can (in theory) write the same code and parallelize
> it across multiple cores, or across an entire cluster, just by
> plugging in different backends.
>
> On Fri 16 Nov 2012 10:17:24 AM PST, Michael Lawrence wrote:
>> I'm not sure I understand the appeal of foreach. Why not do this
>> within the functional paradigm, i.e, parLapply?
>>
>> Michael
>>
>> On Fri, Nov 16, 2012 at 9:41 AM, Ryan C. Thompson
>> <rct at thompsonclan.org <mailto:rct at thompsonclan.org>> wrote:
>>
>>     You could write a %dopar% backend for the foreach package, which
>>     would allow any code using foreach (or plyr which uses foreach) to
>>     parallelize using your code.
>>
>>     On a related note, it might be nice to add Bioconductor-compatible
>>     versions of foreach and the plyr functions to BiocParallel if
>>     they're not already compatible.
>>
>>
>>     On 11/16/2012 12:18 AM, Hahne, Florian wrote:
>>
>>         I've hacked up some code that uses BatchJobs but makes it look
>>         like a
>>         normal parLapply operation. Currently the main R process is
>>         checking the
>>         state of the queue in regular intervals and fetches results
>>         once a job has
>>         finished. Seems to work quite nicely, although there certainly
>>         are more
>>         elaborate ways to deal with the synchronous/asynchronous
>>         issue. Is that
>>         something that could be interesting for the broader audience?
>>         I could add
>>         the code to BiocParallel for folks to try it out.
>>         The whole thing may be a dumb idea, but I find it kind of
>>         useful to be
>>         able to start parallel jobs directly from R on our huge SGE
>>         cluster, have
>>         the calling script wait for all jobs to finish and then
>>         continue with some
>>         downstream computations, rather than having to manually check
>>         the job
>>         status and start another script once the results are there.
>>         Florian
>>
>>
>>



More information about the Bioc-devel mailing list