[Bioc-devel] applying over GRanges and other vectors of ranges
mtmorgan at fhcrc.org
Mon Sep 24 18:08:39 CEST 2012
On 09/24/2012 07:58 AM, Cook, Malcolm wrote:
>> Did you see Malcolm Cook's post recently about fixing pvec() to automatically do this?
>> It seems like a sensible approach to me
> Thanks Tim, I was poised to chime in...
> As Tim said, recently discussed was making pvec work with Lists (including GRangesList) where I offer:
> Here is a (better?) version that does: https://gist.github.com/3757873
> Comments? Improvements? Is it better?
> I had intended to correspond with parallel author on this matter. Is that you Michael?
> Michael, I think I also have a working version of your sblapply, more or less.
> Indeed I would not be surprised if that is what the OP really hope for (guessing here), allowing for a parallel version.
> I think I have done it in a way that supports using multicore/parallel and possibly other back ends as Vincent observed is desirable.
> I will gist it later today for consideration.
> In the meantime, I would appreciate any one to try, criticize, fix, amend my pvec redux (above).
from looking at your earlier gist, I thought you'd identified two
important distinctions -- pvec vs mclapply, and generic pvec vs. making
pvec work with a well-defined api.
With pvec vs mclapply (and parallelizing over GRangesList in the first
place) it's worth keeping in mind that at least for simple operations
the work flow unlist-update-relist is often very very fast. I'll get the
details wrong but for instance 'disjoin' on a GRangesList is implemented as
> selectMethod(disjoin, "GRangesList")
function (x, ...)
gr <- deconstructGRLintoGR(x)
d <- disjoin(gr, ...)
Neither deconstructGRLintoGR nor reconstructGRLfromGR; I'm trying to
convey the general idea rather than practical implementation advice. If
this sounds like what you want to do, then it would be good to have that
as a separate thread with some more details.
With respect to pvec with well-defined API, to me the 'right' thing to
do is to revise pvec as you suggest, but without making additional
changes -- the minimum necessary to accomplish the goal -- and then to
communicate on the R-devel mailing list, perhaps cc'ing Simon Urbanek.
To do this effectively, it would be good to patch the R source -- we
need to get dispatch right inside the package. The way to do this is
svn co https://svn.r-project.org/R/trunk R-devel
then build in a separate directory
mkdir -p ~/bin/R-devel
then patch ~/src/R-devel/src/library/parallel and quickly rebuild the binary
and finally present the patch as
svn diff ~/src/R-devel
>> On Sep 24, 2012, at 6:53 AM, Michael Lawrence <lawrence.michael at gene.com> wrote:
>>> Florian's post about mclapply got me thinking about how it is kind of a
>>> pain to iterate over GRanges objects (since they are not Lists, there is no
>>> lapply). Could we instead have an apply function for vectors that subsets,
>>> i.e., uses [, instead of [[? Maybe sblapply for single bracket? I was also
>>> thinking it would be nice to have an apply function for Seqinfo objects
>>> that would apply over the subranges of all of the sequences, where the size
>>> of the subregion is specified by the user. Maybe call it glapply, where 'g'
>>> is for 'genome'?
>>> [[alternative HTML version deleted]]
>>> Bioc-devel at r-project.org mailing list
>> Bioc-devel at r-project.org mailing list
> Bioc-devel at r-project.org mailing list
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioc-devel