[BioC] Most efficient way to compute width of overlap of multiple features

Tue Jan 7 21:30:27 CET 2014

Michael Lawrence <lawrence.michael at ...> writes:

>

Thanks for your reply and for GenomicRanges.

Please see inline comments below.

> 
> On Tue, Jan 7, 2014 at 9:30 AM, Charles Berry <ccberry at ...> wrote:
> 
> > Vince S. Buffalo <vsbuffalo <at> ...> writes:
> >
> > >
> > > Hi Bioconductor folks,
> > >

> > > cover2.gr <- subset( as( coverage(gr2)>0, "GRanges"), score)
> >
> > seqinfo(cover2.gr) <- seqinfo(gr2)
> >
> 
> I think you just want cover2.gr <- reduce(gr2) instead of the above two
> lines.

Right. I should have recalled that.
> 
> > > hits <- findOverlaps(gr,cover2.gr)
> > > gr.over <- 
> > + pintersect(gr[queryHits(hits)],cover2.gr[subjectHits(hits)])
> >
> 
> w <- width(ranges(hits, ranges(gr), ranges(cover2.gr))) *should* work as >
an alternative to the pintersect call.
> 

It does, but I cannot imagine I would ever have figured this idiom out.

Maybe add an example to one of the help pages??

> > > gr.counts <- tapply(gr.over,queryHits(hits),FUN=function(x)
> > sum(width(x)))
> >
> 
> This tapply is going to kill performance. Use something like:
> 
> gr$overlap <- sum(splitAsList(w, factor(queryHits(hits),
> seq_len(queryLength(hits)))))
> 

Awesome!

AFAICS, splitAsList has no help page. (It is used but not described
in the GenomicRanges HOWTOs vignette.)

I find GenomicRanges is an essential toolkit these days and would like
to be able to figure out idioms like these.

Anything that can be done to make idioms like these easier to find
and use is much appreciated.

> Or even:
> 
> gr$overlap <- sum(relist(w, as(hits, "List")))
> 

Ditto the last comment, I think.

Chuck