[Bioc-devel] BiocParallel

Ryan C. Thompson rct at thompsonclan.org
Sat Nov 17 22:05:29 CET 2012

On 11/17/2012 02:39 AM, Ramon Diaz-Uriarte wrote:
> In addition to Steve's comment, is it really a good thing that "all code
> stays the same."?  I mean, multiple machines vs. multiple cores are,
> often, _very_ different things: for instance, shared vs. distributed
> memory, communication overhead differences, whether or not you can assume
> packages and objects to be automagically present in the slaves/child
> process, etc. So, given they are different situations, I think it
> sometimes makes sense to want to write different code for each situation
> (I often do); not to mention Steve's hybrid cases ;-).
> Since BiocParallel seems to be a major undertaking, maybe it would be
> appropriate to provide a flexible approach, instead of hard wiring the
> foreach approach.
Of course there are cases where the same code simply can't work for both 
multicore and multi-machine situations, but those generally don't fall 
into the category of things that can be done using lapply. Lapply and 
all of its parallelized buddies like mclapply, parLapply, and foreach 
are designed for data-parallel operations with no interdependence 
between results, and these kinds of operations generally parallelize as 
well across machines as across cores, unless your network is not fast 
enough (in which case you would choose not to use multi-machine 
parallelism). If you want a parallel algorithm for something like the 
disjoin method of GRanges, you might need to write some special purpose 
code, and that code might be very different for multicore vs multi-machine.

So yes, sometimes there is a fundamental reason that you have to change 
the code to make it run on multiple machines, and neither foreach nor 
any other parallelization framework will save you from having to rewrite 
your code. But often there is no fundamental reason that the code has to 
change, but you end up changing it anyway because of limitations in your 
parallelization framework. This is the case that foreach saves you from.

More information about the Bioc-devel mailing list