[BioC] multicore and GRangesList [Resurrected]

Cook, Malcolm MEC at stowers.org
Thu Sep 20 21:39:37 CEST 2012


Tim,

My understanding of the term "generic" is from CLOS (Common Lisp Object System)...

and I've not yet forayed much into OO R other than as a user....

but If I understand how the term "generic" is being used here...

I don't think that pvec itself need be a generic.

Rather it just needs to be written in terms of generics.

I may be missing the point.

If its arg, v, implements  '[' and 'length' and `c`, then it _should_ work.

But as written it does not.

Here is a (better?) version that does: https://gist.github.com/3757873

Comments?  Improvements?  Is it better?

Thx,

 ~Malcolm

From: Tim Triche, Jr. [mailto:tim.triche at gmail.com] 
Sent: Thursday, September 20, 2012 12:16 PM
To: Cook, Malcolm
Cc: Martin Morgan; Michael Lawrence <lawrence.michael at gene.com> (lawrence.michael at gene.com); stefano.calza at med.unibs.it; Blanchette, Marco; Bioconductor Newsgroup (bioconductor at stat.math.ethz.ch)
Subject: Re: [BioC] multicore and GRangesList [Resurrected]

Why not make pvec a generic?


On Thu, Sep 20, 2012 at 10:06 AM, Cook, Malcolm <MEC at stowers.org> wrote:
Hi Martin,

The benefits of the functional stuff are purely stylistic.

And NOT (I have just learned) performance!

Indeed, after running some timing tests, I have rewritten pvec_along without using Compose & Curry, as:

pvec_along <-function(x,FUN,...) {
### PURPOSE: extension to parallel::pvec for non-vectors which is
### vectorized over the indices of x.
###
### Example: pvec_along(myGRangesList,width)
###          this is functionally equivalent to:
###          pvec(seq_along(myGRangesList),function(i) width(myGRangesList[i]))
###
### Requires: `library(parallel)`
  indices<-seq_along(x)
  FUN<-match.fun(FUN)
  ## FYI: repeated system.times using 11 cores showed 13% worse
  ## performance using `library(functional)` approach written as:
  ## pvec(indices,Compose(Curry(`[`,x),FUN),...)
  pvec(indices,function(indices) FUN(x[indices]),...)
}

Better?

So, my stylistic preferences are admonished.  I have been increasingly developing idiomatic use of Compose and Curry.  Perhaps I must stop.  Or learn if possible to avoid the overhead they impose.

Regardless....

In any case, pvec_along is just a simple convenience wrapper to something that could be directly written.  But I find it a very useful abstraction.

Do you see better ways of expressing this idiom?

It is arguable that mclapply (and pvec) should 'just work' over GRangesList.  After all, lapply does.

But, to remind us:

> parallel::mclapply(myGRangesList,width)
Error in as.list.default(X) :
  no method for coercing this S4 class to a vector

and, of course, pvec only works with vectors:

> pvec(myGRangesList,width)
Error in pvec(myGRangesList, width) : 'v' must be a vector

Do you think mclapply/pvec should work with Lists?

FWIW: one aspect of pvec that I think could be improved is how the results from each core are combined, which is hard-wired to `c` where it could be made an optional parameter (i.e. `GRangesList`).

In the mean time, FWIW, I have written a similar wrapper to mclapply named mclapply_alongRanges.

~Malcolm


> -----Original Message-----
> From: Martin Morgan [mailto:mtmorgan at fhcrc.org]
> Sent: Thursday, September 20, 2012 8:11 AM
> To: Cook, Malcolm
> Cc: 'Bioconductor Newsgroup (bioconductor at stat.math.ethz.ch)'; 'arne.mueller at novartis.com'; 'stefano.calza at med.unibs.it';
> 'barr.cory at gene.com'; 'Steve Lianoglou (mailinglist.honeypot at gmail.com)'; 'Michael Lawrence <lawrence.michael at gene.com>
> (lawrence.michael at gene.com)'; Blanchette, Marco
> Subject: Re: multicore and GRangesList [Resurrected]
>
> On 09/19/2012 09:30 AM, Cook, Malcolm wrote:
> > The question of approaches to parallelizing operations on a GRangesList was raised in this thread:
> http://thread.gmane.org/gmane.science.biology.informatics.conductor/32799
> >
> > I find the issue still relevant when using the new `parallel` package.
> >
> > I have adopted the following practice, for which I seek your criticism or accolades.  Your choice.
> >
> > The approach is to use parallel::pvec over the indices of the GRangesList, with a little sugar in the form of...
> >
> > pvec_along <-function(x,FUN,...) {
> > ### PURPOSE: extension to parallel::pvec for non-vectors which is
> > ### vectorized over the indices of x.
> > ###
> > ### Example: pvec_along(myGRangesList,width)
> > ###
> > ### Requires: `library(functional)` `library(parallel)`
> >    indices<-seq_along(x)
> >    FUN<-match.fun(FUN)
> >    pvec(indices,Compose(Curry(`[`,x),FUN),...)
> > }
> >
> > Discuss?
>
> pvec seems conceptually relevant; the benefits of the functional stuff
> not immediately clear. Explain.
>
> >
> > Best,
> >
> > ~ Malcolm Cook
> >
>
>
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor




-- 
A model is a lie that helps you see the truth.

Howard Skipper



More information about the Bioconductor mailing list