[Bioc-devel] applying over GRanges and other vectors of ranges
Martin Morgan
mtmorgan at fhcrc.org
Mon Sep 24 18:08:39 CEST 2012
On 09/24/2012 07:58 AM, Cook, Malcolm wrote:
>> Did you see Malcolm Cook's post recently about fixing pvec() to automatically do this?
>>
>> It seems like a sensible approach to me
>>
>
> Thanks Tim, I was poised to chime in...
>
> As Tim said, recently discussed was making pvec work with Lists (including GRangesList) where I offer:
>
> Here is a (better?) version that does: https://gist.github.com/3757873
> Comments? Improvements? Is it better?
>
> I had intended to correspond with parallel author on this matter. Is that you Michael?
>
> Michael, I think I also have a working version of your sblapply, more or less.
>
> Indeed I would not be surprised if that is what the OP really hope for (guessing here), allowing for a parallel version.
>
> I think I have done it in a way that supports using multicore/parallel and possibly other back ends as Vincent observed is desirable.
>
> I will gist it later today for consideration.
>
> In the meantime, I would appreciate any one to try, criticize, fix, amend my pvec redux (above).
from looking at your earlier gist, I thought you'd identified two
important distinctions -- pvec vs mclapply, and generic pvec vs. making
pvec work with a well-defined api.
With pvec vs mclapply (and parallelizing over GRangesList in the first
place) it's worth keeping in mind that at least for simple operations
the work flow unlist-update-relist is often very very fast. I'll get the
details wrong but for instance 'disjoin' on a GRangesList is implemented as
> selectMethod(disjoin, "GRangesList")
Method Definition:
function (x, ...)
{
gr <- deconstructGRLintoGR(x)
d <- disjoin(gr, ...)
reconstructGRLfromGR(d, x)
}
<environment: namespace:GenomicRanges>
Signatures:
x
target "GRangesList"
Neither deconstructGRLintoGR nor reconstructGRLfromGR; I'm trying to
convey the general idea rather than practical implementation advice. If
this sounds like what you want to do, then it would be good to have that
as a separate thread with some more details.
With respect to pvec with well-defined API, to me the 'right' thing to
do is to revise pvec as you suggest, but without making additional
changes -- the minimum necessary to accomplish the goal -- and then to
communicate on the R-devel mailing list, perhaps cc'ing Simon Urbanek.
To do this effectively, it would be good to patch the R source -- we
need to get dispatch right inside the package. The way to do this is
probably
mkdir ~/src
cd ~/src
svn co https://svn.r-project.org/R/trunk R-devel
tools/rsync-recommended
then build in a separate directory
mkdir -p ~/bin/R-devel
cd ~/bin/R-devel
~/src/R-devel/configure
make -j
then patch ~/src/R-devel/src/library/parallel and quickly rebuild the binary
cd ~/bin/R-devel/src/library/parallel
make
and finally present the patch as
svn diff ~/src/R-devel
Martin
>
> Cheers,
>
> --Malcolm
>
>> --t
>>
>> On Sep 24, 2012, at 6:53 AM, Michael Lawrence <lawrence.michael at gene.com> wrote:
>>
>>> Florian's post about mclapply got me thinking about how it is kind of a
>>> pain to iterate over GRanges objects (since they are not Lists, there is no
>>> lapply). Could we instead have an apply function for vectors that subsets,
>>> i.e., uses [, instead of [[? Maybe sblapply for single bracket? I was also
>>> thinking it would be nice to have an apply function for Seqinfo objects
>>> that would apply over the subranges of all of the sequences, where the size
>>> of the subregion is specified by the user. Maybe call it glapply, where 'g'
>>> is for 'genome'?
>>>
>>> Michael
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioc-devel
mailing list