[Bioc-devel] applying over GRanges and other vectors of ranges

Martin Morgan mtmorgan at fhcrc.org
Mon Sep 24 18:08:39 CEST 2012


On 09/24/2012 07:58 AM, Cook, Malcolm wrote:
>> Did you see Malcolm Cook's post recently about fixing pvec() to automatically do this?
>>
>> It seems like a sensible approach to me
>>
>
> Thanks Tim, I was poised to chime in...
>
> As Tim said, recently discussed was making pvec work with Lists (including GRangesList) where I offer:
>
> 	Here is a (better?) version that does: https://gist.github.com/3757873
> 	Comments?  Improvements?  Is it better?
>
> I had intended to correspond with parallel author on this matter.  Is that you Michael?
>
> Michael, I think I also have a working version of your sblapply, more or less.
>
> Indeed I would not be surprised if that is what the OP really hope for (guessing here), allowing for a parallel version.
>
> I think I have done it in a way that supports using multicore/parallel and possibly other back ends as Vincent observed is desirable.
>
> I will gist it later today for consideration.
>
> In the meantime, I would appreciate any one to try, criticize, fix, amend my pvec redux (above).

from looking at your earlier gist, I thought you'd identified two 
important distinctions -- pvec vs mclapply, and generic pvec vs. making 
pvec work with a well-defined api.

With pvec vs mclapply (and parallelizing over GRangesList in the first 
place) it's worth keeping in mind that at least for simple operations 
the work flow unlist-update-relist is often very very fast. I'll get the 
details wrong but for instance 'disjoin' on a GRangesList is implemented as

 > selectMethod(disjoin, "GRangesList")
Method Definition:

function (x, ...)
{
     gr <- deconstructGRLintoGR(x)
     d <- disjoin(gr, ...)
     reconstructGRLfromGR(d, x)
}
<environment: namespace:GenomicRanges>

Signatures:
         x
target  "GRangesList"

Neither deconstructGRLintoGR nor reconstructGRLfromGR; I'm trying to 
convey the general idea rather than practical implementation advice. If 
this sounds like what you want to do, then it would be good to have that 
as a separate thread with some more details.

With respect to pvec with well-defined API, to me the 'right' thing to 
do is to revise pvec as you suggest, but without making additional 
changes -- the minimum necessary to accomplish the goal -- and then to 
communicate on the R-devel mailing list, perhaps cc'ing Simon Urbanek.

To do this effectively, it would be good to patch the R source -- we 
need to get dispatch right inside the package. The way to do this is 
probably

   mkdir ~/src
   cd ~/src
   svn co https://svn.r-project.org/R/trunk R-devel
   tools/rsync-recommended

then build in a separate directory

   mkdir -p ~/bin/R-devel
   cd ~/bin/R-devel
   ~/src/R-devel/configure
   make -j

then patch ~/src/R-devel/src/library/parallel and quickly rebuild the binary

   cd ~/bin/R-devel/src/library/parallel
   make

and finally present the patch as

   svn diff ~/src/R-devel

Martin

>
> Cheers,
>
> --Malcolm
>
>> --t
>>
>> On Sep 24, 2012, at 6:53 AM, Michael Lawrence <lawrence.michael at gene.com> wrote:
>>
>>> Florian's post about mclapply got me thinking about how it is kind of a
>>> pain to iterate over GRanges objects (since they are not Lists, there is no
>>> lapply). Could we instead have an apply function for vectors that subsets,
>>> i.e., uses [, instead of [[? Maybe sblapply for single bracket? I was also
>>> thinking it would be nice to have an apply function for Seqinfo objects
>>> that would apply over the subranges of all of the sequences, where the size
>>> of the subregion is specified by the user. Maybe call it glapply, where 'g'
>>> is for 'genome'?
>>>
>>> Michael
>>>
>>>     [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-devel mailing list