[Bioc-devel] BiocParallel: BatchJobs backend (Was: Re: BiocParallel)

Tue Jun 25 14:17:56 CEST 2013

Hi Henrik,

Sorry for the late response. Suggestions and feedback are always
welcome. I just forgot to enable the issue tracker (now enabled).

For prototyping I usually use Interactive/Multicore, but I'll
regularly test on our local clusters which use Torque or Slurm,
respectively.

Michel

2013/6/7 Henrik Bengtsson <hb at biostat.ucsf.edu>:
> Great - this looks promising already.
>
> What's your test system(s), beyond standard SSH and multicore
> clusters?  I'm on a Torque/PBS system.
>
> I'm happy to test, give feedback etc.  I don't see an 'Issues' tab on
> the GitHub page.  Michel, how do you prefer to get feedback?
>
> /Henrik
>
>
> On Thu, Jun 6, 2013 at 5:21 PM, Michael Lawrence
> <lawrence.michael at gene.com> wrote:
>> And here is the on-going development of the backend:
>> https://github.com/mllg/BiocParallel/tree/batchjobs
>>
>> Not sure how well it's been tested.
>>
>> Kudos to Michel Lang for making so much progress so quickly.
>>
>> Michael
>>
>> On Thu, Jun 6, 2013 at 1:59 PM, Dan Tenenbaum <dtenenba at fhcrc.org> wrote:
>>>
>>> On Thu, Jun 6, 2013 at 1:56 PM, Henrik Bengtsson <hb at biostat.ucsf.edu>
>>> wrote:
>>> > Hi, I'd like to pick up the discussion on a BatchJobs backend for
>>> > BiocParallel where it was left back in Dec 2012 (Bioc-devel thread
>>> > 'BiocParallel'
>>> > [https://stat.ethz.ch/pipermail/bioc-devel/2012-December/003918.html]).
>>> >
>>> > Florian, would you mind sharing your BatchJobs backend code?  Is it
>>> > independent of BiocParallel and/or have you tried it with the most
>>> > recent BiocParallel implementation
>>> > [https://github.com/Bioconductor/BiocParallel/]?
>>> >
>>>
>>> You should be aware that there is  Google Summer of Code project in
>>> progress to address this.
>>>
>>> http://www.bioconductor.org/developers/gsoc2013/ (towards the bottom)
>>>
>>> Dan
>>>
>>>
>>> > /Henrik
>>> >
>>> > On Tue, Dec 4, 2012 at 12:38 PM, Henrik Bengtsson <hb at biostat.ucsf.edu>
>>> > wrote:
>>> >> Thanks.
>>> >>
>>> >> On Tue, Dec 4, 2012 at 3:47 AM, Vincent Carey
>>> >> <stvjc at channing.harvard.edu> wrote:
>>> >>> I have been booked up so no chance to deploy but I do have access to
>>> >>> SGE and
>>> >>> LSF so will try and will report ASAP.
>>> >>
>>> >> ...and I'll try it out on PBS (... but I most likely won't have time
>>> >> to do this until the end of the year).
>>> >>
>>> >> Henrik
>>> >>
>>> >>>
>>> >>>
>>> >>> On Tue, Dec 4, 2012 at 4:08 AM, Hahne, Florian
>>> >>> <florian.hahne at novartis.com>
>>> >>> wrote:
>>> >>>>
>>> >>>> Hi Henrik,
>>> >>>> I have now come up now with a relatively generic version of this
>>> >>>> SGEcluster approach. It does indeed use BatchJobs under the hood and
>>> >>>> should thus support all available cluster queues, assuming that the
>>> >>>> necessary batchJobs routines are available. I could only test this on
>>> >>>> our
>>> >>>> SGE cluster, but Vince wanted to try other queuing systems. Not sure
>>> >>>> how
>>> >>>> far he got. For now the code is wrapped in a little package called
>>> >>>> Qcluster with some documentation. If you want to I can send you a
>>> >>>> version
>>> >>>> in a separate mail. Would be good to test this on other systems, and
>>> >>>> I am
>>> >>>> sure there remain some bugs that need to be ironed out. In particular
>>> >>>> the
>>> >>>> fault tolerance you mentioned needs to be addressed properly.
>>> >>>> Currently
>>> >>>> the code may leave unwanted garbage if things fail in the wrong
>>> >>>> places
>>> >>>> because all the communication is file-based.
>>> >>>> Martin, I'll send you my updated version in case you want to include
>>> >>>> this
>>> >>>> in biocParallel for others to contribute.
>>> >>>> Florian
>>> >>>> --
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On 12/4/12 5:46 AM, "Henrik Bengtsson" <hb at biostat.ucsf.edu> wrote:
>>> >>>>
>>> >>>> >Picking up this thread in lack of other places (= were should
>>> >>>> >BiocParallel be discussed?)
>>> >>>> >
>>> >>>> >I saw Martin's updates on the BiocParallel - great.  Florian's SGE
>>> >>>> >scheduler was also mentioned; is that one built on top of BatchJobs?
>>> >>>> >If so I'd be interested in looking into that/generalizing that to
>>> >>>> > work
>>> >>>> >with any BatchJobs scheduler.
>>> >>>> >
>>> >>>> >I believe there is going to be a new release of BatchJobs rather
>>> >>>> > soon,
>>> >>>> >so it's probably worth waiting until that is available.
>>> >>>> >
>>> >>>> >The main use case I'm interested in is to launch batch jobs on a
>>> >>>> >PBS/Torque cluster, and then use multicore processing on each
>>> >>>> > compute
>>> >>>> >node.  It would be nice to be able to do this using the BiocParallel
>>> >>>> >model, but maybe it is too optimistic to get everything to work
>>> >>>> > under
>>> >>>> >same model.  Also, as Vince hinted, fault tolerance etc needs to be
>>> >>>> >addressed and needs to be addressed differently in the different
>>> >>>> >setups.
>>> >>>> >
>>> >>>> >/Henrik
>>> >>>> >
>>> >>>> >On Tue, Nov 20, 2012 at 6:59 AM, Ramon Diaz-Uriarte
>>> >>>> > <rdiaz02 at gmail.com>
>>> >>>> >wrote:
>>> >>>> >>
>>> >>>> >>
>>> >>>> >>
>>> >>>> >> On Sat, 17 Nov 2012 13:05:29 -0800,"Ryan C. Thompson"
>>> >>>> >><rct at thompsonclan.org> wrote:
>>> >>>> >>
>>> >>>> >>> On 11/17/2012 02:39 AM, Ramon Diaz-Uriarte wrote:
>>> >>>> >>> > In addition to Steve's comment, is it really a good thing that
>>> >>>> >>> > "all
>>> >>>> >>>code
>>> >>>> >>> > stays the same."?  I mean, multiple machines vs. multiple cores
>>> >>>> >>> > are,
>>> >>>> >>> > often, _very_ different things: for instance, shared vs.
>>> >>>> >>> > distributed
>>> >>>> >>> > memory, communication overhead differences, whether or not you
>>> >>>> >>> > can
>>> >>>> >>>assume
>>> >>>> >>> > packages and objects to be automagically present in the
>>> >>>> >>> > slaves/child
>>> >>>> >>> > process, etc. So, given they are different situations, I think
>>> >>>> >>> > it
>>> >>>> >>> > sometimes makes sense to want to write different code for each
>>> >>>> >>>situation
>>> >>>> >>> > (I often do); not to mention Steve's hybrid cases ;-).
>>> >>>> >>> >
>>> >>>> >>> >
>>> >>>> >>> > Since BiocParallel seems to be a major undertaking, maybe it
>>> >>>> >>> > would
>>> >>>> >>> > be
>>> >>>> >>> > appropriate to provide a flexible approach, instead of hard
>>> >>>> >>> > wiring
>>> >>>> >>>the
>>> >>>> >>> > foreach approach.
>>> >>>> >>> Of course there are cases where the same code simply can't work
>>> >>>> >>> for
>>> >>>> >>>both
>>> >>>> >>> multicore and multi-machine situations, but those generally don't
>>> >>>> >>> fall
>>> >>>> >>> into the category of things that can be done using lapply. Lapply
>>> >>>> >>> and
>>> >>>> >>> all of its parallelized buddies like mclapply, parLapply, and
>>> >>>> >>> foreach
>>> >>>> >>> are designed for data-parallel operations with no interdependence
>>> >>>> >>> between results, and these kinds of operations generally
>>> >>>> >>> parallelize
>>> >>>> >>> as
>>> >>>> >>> well across machines as across cores, unless your network is not
>>> >>>> >>> fast
>>> >>>> >>> enough (in which case you would choose not to use multi-machine
>>> >>>> >>> parallelism). If you want a parallel algorithm for something like
>>> >>>> >>> the
>>> >>>> >>> disjoin method of GRanges, you might need to write some special
>>> >>>> >>> purpose
>>> >>>> >>> code, and that code might be very different for multicore vs
>>> >>>> >>>multi-machine.
>>> >>>> >>
>>> >>>> >>> So yes, sometimes there is a fundamental reason that you have to
>>> >>>> >>> change
>>> >>>> >>> the code to make it run on multiple machines, and neither foreach
>>> >>>> >>> nor
>>> >>>> >>> any other parallelization framework will save you from having to
>>> >>>> >>>rewrite
>>> >>>> >>> your code. But often there is no fundamental reason that the code
>>> >>>> >>> has
>>> >>>> >>>to
>>> >>>> >>> change, but you end up changing it anyway because of limitations
>>> >>>> >>> in
>>> >>>> >>>your
>>> >>>> >>> parallelization framework. This is the case that foreach saves
>>> >>>> >>> you
>>> >>>> >>>from.
>>> >>>> >>
>>> >>>> >>
>>> >>>> >>
>>> >>>> >> Hummm... I guess you are right, and we are talking about "often"
>>> >>>> >> or
>>> >>>> >>"most
>>> >>>> >> of the time", which is where all this would fit. Point taken.
>>> >>>> >>
>>> >>>> >>
>>> >>>> >> Best,
>>> >>>> >>
>>> >>>> >> R.
>>> >>>> >>
>>> >>>> >>
>>> >>>> >>
>>> >>>> >>
>>> >>>> >>
>>> >>>> >>
>>> >>>> >> --
>>> >>>> >> Ramon Diaz-Uriarte
>>> >>>> >> Department of Biochemistry, Lab B-25
>>> >>>> >> Facultad de Medicina
>>> >>>> >> Universidad Autónoma de Madrid
>>> >>>> >> Arzobispo Morcillo, 4
>>> >>>> >> 28029 Madrid
>>> >>>> >> Spain
>>> >>>> >>
>>> >>>> >> Phone: +34-91-497-2412
>>> >>>> >>
>>> >>>> >> Email: rdiaz02 at gmail.com
>>> >>>> >>        ramon.diaz at iib.uam.es
>>> >>>> >>
>>> >>>> >> http://ligarto.org/rdiaz
>>> >>>> >>
>>> >>>> >> _______________________________________________
>>> >>>> >> Bioc-devel at r-project.org mailing list
>>> >>>> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>> >>>>
>>> >>>
>>> >
>>> > _______________________________________________
>>> > Bioc-devel at r-project.org mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>