[Bioc-devel] parallel package generics

Martin Morgan mtmorgan at fhcrc.org
Tue Oct 23 23:10:30 CEST 2012


Hi Steve --

On 10/23/2012 10:20 AM, Vincent Carey wrote:
> On Tue, Oct 23, 2012 at 12:13 PM, Steve Lianoglou <
> mailinglist.honeypot at gmail.com> wrote:
>
>> In response to a question from yesterday, I pointed someone to the
>> ShortRead `srapply` function and I wondered to myself why it had to
>> necessarily by "burried" in the ShortRead package (aside from it
>> having a `sr` prefix).

I don't know that srapply necessarily 'got it right'...

>>
>> I had thought it might be a good idea to move that (or something like
>> that) to BiocGenerics (unless implementations aren't allowed there)
>> but also realized that it would add more dependencies where someone
>> might not necessarily need them.
>>
>> But, almost surely, a large majority of the people will be happy to do
>> some form of ||-ization, so in my mind it's not such an onerous thing
>> to add -- on the other hand, this large majority is probably enriched
>> for people who are doing NGS analysis, in which case, keeping it in
>> ShortRead can make some sense.
>>
>> Taking one step back, I recall some chatter last week (or two) about
>> some better ||-ization "primitives" -- something about a pvec doo-dad,
>> and there being ideas to wrap different types of ||-ization behind an
>> easy to use interface (I think this was the convo), and then I took a
>> further step back and often wonder why we just don't bite the bullet
>> and take advantage of the `foreach` infrastructure that is already out
>> there -- in which case, I could imagne a "doSGE" package that might
>> handle the particulars of what Florain is referring to. You could then
>> configure it externally via some `registerDoSGE(some.config.object)`
>> and just have the package code happily run it through `foreach(...)
>> %dopar%` and be done w/ it.
>>
>>
> IMHO it is relevant.  I have not looked for other abstractions, and this
> one seems
> to work.  Florian's objectives might be a good test case for adequacy.

The registerDoDah does seem to be a useful abstraction.

I think there's a lot of work to do for some sort of coordinated parallelization 
that putting parLapply into BiocGenerics might encourage; not good things will 
happen when everyone in a call stack tries to parallelize independently. But I'm 
in favor of parLapply in BiocGenerics at least for the moment.

Martin

>
>
>> ... at least, I thought this is what was being talked about here (and
>> popped up a week or two ago) -- sorry if I completely missed the mark
>> ...
>>
>> -steve
>>
>>
>> On Tue, Oct 23, 2012 at 10:38 AM, Hahne, Florian
>> <florian.hahne at novartis.com> wrote:
>>> Hi Martin,
>>> I could define the generics in my own package, but that would mean that
>>> those will only be available there, or in the global environment assuming
>>> that I also export them, or in all additional packages that explicitly
>>> import them from my name space. Now there already are a whole bunch of
>>> packages around that all allow for parallelization via a cluster object.
>>> Obviously those all import the parLapply function from the parallel
>>> package. That means that I can't simply supply my own modified cluster
>>> object, because the code that calls parLapply will not know about the
>>> generic in my package, even if it is attached. Ideally parLapply would be
>>> a generic function already in the parallel package. Not sure who needs to
>>> be convinced in order for this to happen, but my gut feeling was that it
>>> could be easier to have the generic in BiocGenerics.
>>> Maybe I am missing something obvious here, but imo there is no way to
>>> overwrite parLapply globally for my own class unless the generic is
>>> imported by everyone who wants to make use of the special method.
>>> Florian
>>> --
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 10/23/12 2:20 PM, "Martin Morgan" <mtmorgan at fhcrc.org> wrote:
>>>
>>>> On 10/17/2012 05:45 AM, Hahne, Florian wrote:
>>>>> Hi all,
>>>>> I was wondering whether it would be possible to have proper generics
>> for
>>>>> some of the functions in the parallel package, e.g. parLapply and
>>>>> clusterCall. The reason I am asking is because I want to build an S4
>>>>> class
>>>>> that essentially looks like an S3 cluster object but knows how to deal
>>>>> with the SGE. That way I can abstract away all the overhead regarding
>>>>> job
>>>>> submission, job status and reducing the results in the parLapply method
>>>>> of
>>>>> that class, and would be able to supply this new cluster object to all
>>>>> of
>>>>> my existing functions that can be processed in parallel using a cluster
>>>>> object as input. I have played around with the BatchJobs package as an
>>>>> abstraction layer to SGE and that work nicely. As a test case I have
>>>>> created the necessary generics myself in order to supply my own
>>>>> SGEcluster
>>>>> object to a function that normally deals with the "regular" parallel
>>>>> package S3 cluster objects and everything just worked out of the box,
>>>>> but
>>>>> obviously this fails once I am in a name space and my generic is not
>>>>> found
>>>>> anymore. Of course what we would really want is some proper abstraction
>>>>> of
>>>>> parallelization in R, but for now this seem to be at least a cheap
>>>>> compromise. Any thoughts on this?
>>>>
>>>> Hi Florian -- we talked about this locally, but I guess we didn't
>>>> actually send
>>>> any email!
>>>>
>>>> Is there an obstacle to promoting these to generics in your own package?
>>>> The
>>>> usual motivation for inclusion in BiocGenerics has been to avoid
>>>> conflicts
>>>> between packages, but I'm not sure whether this is the case (yet)? This
>>>> would
>>>> also add a dependency fairly deep in the hierarchy.
>>>>
>>>> What do you think?
>>>>
>>>> Martin
>>>>
>>>>> Florian
>>>>>
>>>>
>>>>
>>>> --
>>>> Computational Biology / Fred Hutchinson Cancer Research Center
>>>> 1100 Fairview Ave. N.
>>>> PO Box 19024 Seattle, WA 98109
>>>>
>>>> Location: Arnold Building M1 B861
>>>> Phone: (206) 667-2793
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>>
>> --
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>>   | Memorial Sloan-Kettering Cancer Center
>>   | Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-devel mailing list