[Bioc-devel] BiocParallel

Ryan C. Thompson rct at thompsonclan.org
Sat Nov 17 05:42:38 CET 2012


The difference is that in the parallel package, you use mclapply for 
multicore and parLapply for multi-machine parallelism. If you want to 
switch from one to the other, you have to change all your code that 
uses either function to the other one. If you use llply(..., 
.parallel=TRUE), then all you have to do is register a different 
backend, which is one line of code to load the new backend and a second 
one to register it, and the rest of your code stays the same.

On Fri 16 Nov 2012 03:24:56 PM PST, Michael Lawrence wrote:
>
>
> On Fri, Nov 16, 2012 at 11:44 AM, Ryan C. Thompson
> <rct at thompsonclan.org <mailto:rct at thompsonclan.org>> wrote:
>
>     You don't have to use foreach directly. I use foreach almost
>     exclusively through the plyr package, which uses foreach
>     internally to implement parallelism. Like you, I'm not
>     particularly fond of the foreach syntax (though it has some nice
>     features that come in handy sometimes).
>
>     The appeal of foreach is that it supports pluggable parallelizing
>     backends, so you can (in theory) write the same code and
>     parallelize it across multiple cores, or across an entire cluster,
>     just by plugging in different backends.
>
>
> But isn't this also possible with the parallel package? It was
> inherited from snow. I'd be more in favor of extending the parallel
> package, simply because it's part of base R.
>
>
>
>     On Fri 16 Nov 2012 10:17:24 AM PST, Michael Lawrence wrote:
>
>         I'm not sure I understand the appeal of foreach. Why not do this
>         within the functional paradigm, i.e, parLapply?
>
>         Michael
>
>         On Fri, Nov 16, 2012 at 9:41 AM, Ryan C. Thompson
>         <rct at thompsonclan.org <mailto:rct at thompsonclan.org>
>         <mailto:rct at thompsonclan.org <mailto:rct at thompsonclan.org>>>
>         wrote:
>
>             You could write a %dopar% backend for the foreach package,
>         which
>             would allow any code using foreach (or plyr which uses
>         foreach) to
>             parallelize using your code.
>
>             On a related note, it might be nice to add
>         Bioconductor-compatible
>             versions of foreach and the plyr functions to BiocParallel if
>             they're not already compatible.
>
>
>             On 11/16/2012 12:18 AM, Hahne, Florian wrote:
>
>                 I've hacked up some code that uses BatchJobs but makes
>         it look
>                 like a
>                 normal parLapply operation. Currently the main R
>         process is
>                 checking the
>                 state of the queue in regular intervals and fetches
>         results
>                 once a job has
>                 finished. Seems to work quite nicely, although there
>         certainly
>                 are more
>                 elaborate ways to deal with the synchronous/asynchronous
>                 issue. Is that
>                 something that could be interesting for the broader
>         audience?
>                 I could add
>                 the code to BiocParallel for folks to try it out.
>                 The whole thing may be a dumb idea, but I find it kind of
>                 useful to be
>                 able to start parallel jobs directly from R on our
>         huge SGE
>                 cluster, have
>                 the calling script wait for all jobs to finish and then
>                 continue with some
>                 downstream computations, rather than having to
>         manually check
>                 the job
>                 status and start another script once the results are
>         there.
>                 Florian
>
>
>
>



More information about the Bioc-devel mailing list