[Bioc-devel] BiocParallel

Ramon Diaz-Uriarte rdiaz02 at gmail.com
Tue Nov 20 15:59:55 CET 2012

On Sat, 17 Nov 2012 13:05:29 -0800,"Ryan C. Thompson" <rct at thompsonclan.org> wrote:

> On 11/17/2012 02:39 AM, Ramon Diaz-Uriarte wrote:
> > In addition to Steve's comment, is it really a good thing that "all code
> > stays the same."?  I mean, multiple machines vs. multiple cores are,
> > often, _very_ different things: for instance, shared vs. distributed
> > memory, communication overhead differences, whether or not you can assume
> > packages and objects to be automagically present in the slaves/child
> > process, etc. So, given they are different situations, I think it
> > sometimes makes sense to want to write different code for each situation
> > (I often do); not to mention Steve's hybrid cases ;-).
> >
> >
> > Since BiocParallel seems to be a major undertaking, maybe it would be
> > appropriate to provide a flexible approach, instead of hard wiring the
> > foreach approach.
> Of course there are cases where the same code simply can't work for both 
> multicore and multi-machine situations, but those generally don't fall 
> into the category of things that can be done using lapply. Lapply and 
> all of its parallelized buddies like mclapply, parLapply, and foreach 
> are designed for data-parallel operations with no interdependence 
> between results, and these kinds of operations generally parallelize as 
> well across machines as across cores, unless your network is not fast 
> enough (in which case you would choose not to use multi-machine 
> parallelism). If you want a parallel algorithm for something like the 
> disjoin method of GRanges, you might need to write some special purpose 
> code, and that code might be very different for multicore vs multi-machine.

> So yes, sometimes there is a fundamental reason that you have to change 
> the code to make it run on multiple machines, and neither foreach nor 
> any other parallelization framework will save you from having to rewrite 
> your code. But often there is no fundamental reason that the code has to 
> change, but you end up changing it anyway because of limitations in your 
> parallelization framework. This is the case that foreach saves you from.

Hummm... I guess you are right, and we are talking about "often" or "most
of the time", which is where all this would fit. Point taken.



Ramon Diaz-Uriarte
Department of Biochemistry, Lab B-25
Facultad de Medicina 
Universidad Autónoma de Madrid 
Arzobispo Morcillo, 4
28029 Madrid

Phone: +34-91-497-2412

Email: rdiaz02 at gmail.com
       ramon.diaz at iib.uam.es


More information about the Bioc-devel mailing list