[Bioc-devel] parallel package generics

Steve Lianoglou mailinglist.honeypot at gmail.com
Tue Oct 23 18:13:27 CEST 2012


In response to a question from yesterday, I pointed someone to the
ShortRead `srapply` function and I wondered to myself why it had to
necessarily by "burried" in the ShortRead package (aside from it
having a `sr` prefix).

I had thought it might be a good idea to move that (or something like
that) to BiocGenerics (unless implementations aren't allowed there)
but also realized that it would add more dependencies where someone
might not necessarily need them.

But, almost surely, a large majority of the people will be happy to do
some form of ||-ization, so in my mind it's not such an onerous thing
to add -- on the other hand, this large majority is probably enriched
for people who are doing NGS analysis, in which case, keeping it in
ShortRead can make some sense.

Taking one step back, I recall some chatter last week (or two) about
some better ||-ization "primitives" -- something about a pvec doo-dad,
and there being ideas to wrap different types of ||-ization behind an
easy to use interface (I think this was the convo), and then I took a
further step back and often wonder why we just don't bite the bullet
and take advantage of the `foreach` infrastructure that is already out
there -- in which case, I could imagne a "doSGE" package that might
handle the particulars of what Florain is referring to. You could then
configure it externally via some `registerDoSGE(some.config.object)`
and just have the package code happily run it through `foreach(...)
%dopar%` and be done w/ it.

... at least, I thought this is what was being talked about here (and
popped up a week or two ago) -- sorry if I completely missed the mark
...

-steve


On Tue, Oct 23, 2012 at 10:38 AM, Hahne, Florian
<florian.hahne at novartis.com> wrote:
> Hi Martin,
> I could define the generics in my own package, but that would mean that
> those will only be available there, or in the global environment assuming
> that I also export them, or in all additional packages that explicitly
> import them from my name space. Now there already are a whole bunch of
> packages around that all allow for parallelization via a cluster object.
> Obviously those all import the parLapply function from the parallel
> package. That means that I can't simply supply my own modified cluster
> object, because the code that calls parLapply will not know about the
> generic in my package, even if it is attached. Ideally parLapply would be
> a generic function already in the parallel package. Not sure who needs to
> be convinced in order for this to happen, but my gut feeling was that it
> could be easier to have the generic in BiocGenerics.
> Maybe I am missing something obvious here, but imo there is no way to
> overwrite parLapply globally for my own class unless the generic is
> imported by everyone who wants to make use of the special method.
> Florian
> --
>
>
>
>
>
>
> On 10/23/12 2:20 PM, "Martin Morgan" <mtmorgan at fhcrc.org> wrote:
>
>>On 10/17/2012 05:45 AM, Hahne, Florian wrote:
>>> Hi all,
>>> I was wondering whether it would be possible to have proper generics for
>>> some of the functions in the parallel package, e.g. parLapply and
>>> clusterCall. The reason I am asking is because I want to build an S4
>>>class
>>> that essentially looks like an S3 cluster object but knows how to deal
>>> with the SGE. That way I can abstract away all the overhead regarding
>>>job
>>> submission, job status and reducing the results in the parLapply method
>>>of
>>> that class, and would be able to supply this new cluster object to all
>>>of
>>> my existing functions that can be processed in parallel using a cluster
>>> object as input. I have played around with the BatchJobs package as an
>>> abstraction layer to SGE and that work nicely. As a test case I have
>>> created the necessary generics myself in order to supply my own
>>>SGEcluster
>>> object to a function that normally deals with the "regular" parallel
>>> package S3 cluster objects and everything just worked out of the box,
>>>but
>>> obviously this fails once I am in a name space and my generic is not
>>>found
>>> anymore. Of course what we would really want is some proper abstraction
>>>of
>>> parallelization in R, but for now this seem to be at least a cheap
>>> compromise. Any thoughts on this?
>>
>>Hi Florian -- we talked about this locally, but I guess we didn't
>>actually send
>>any email!
>>
>>Is there an obstacle to promoting these to generics in your own package?
>>The
>>usual motivation for inclusion in BiocGenerics has been to avoid
>>conflicts
>>between packages, but I'm not sure whether this is the case (yet)? This
>>would
>>also add a dependency fairly deep in the hierarchy.
>>
>>What do you think?
>>
>>Martin
>>
>>> Florian
>>>
>>
>>
>>--
>>Computational Biology / Fred Hutchinson Cancer Research Center
>>1100 Fairview Ave. N.
>>PO Box 19024 Seattle, WA 98109
>>
>>Location: Arnold Building M1 B861
>>Phone: (206) 667-2793
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioc-devel mailing list