[Bioc-devel] mapping between original and reduced ranges
Kasper Daniel Hansen
kasperdanielhansen at gmail.com
Thu Mar 15 13:44:37 CET 2012
So the key question is to what extent keeping track of where the
ranges comes from would slow down the reduce operation. I am not
familiar enough with the algorithm to know this, but given how fast
IRanges is in general, I am not one for guessing on this.
I agree with Florian that this is a very typical use case.
On Thu, Mar 15, 2012 at 5:02 AM, Hahne, Florian
<florian.hahne at novartis.com> wrote:
> Hi all,
> It is true that this is not terribly slow when you deal with fairly large
> range objects:
> foo <- GRanges(seqnames=sample(1:4, 1e6, TRUE),
> ranges=IRanges(start=as.integer(runif(min=1, max=1e7, n=1e6)), width=50))
> system.time(bar <- reduce(foo))
> user system elapsed
> 0.918 0.174 1.091
> system.time(foobar <- findOverlaps(foo, bar))
> user system elapsed
> 2.051 0.402 2.453
> However the whole process does take about 3x the time of just the reduce
> operation, and in my use case I want this to happen interactively, where
> waiting 3 seconds compared to 1 makes a huge difference...
> I wouldn't push this high up on the development agenda, but it seems to be
> something that is already 95% existing and could easily be added. But
> maybe I am wrong...
> Florian Hahne
> Novartis Institute For Biomedical Research
> Translational Sciences / Preclinical Safety / PCS Informatics
> Expert Data Integration and Modeling Bioinformatics
> CHBS, WKL-135.2.26
> Novartis Institute For Biomedical Research, Werk Klybeck
> Klybeckstrasse 141
> CH-4057 Basel
> Phone: +41 61 6967127
> Email : florian.hahne at novartis.com
> On 3/14/12 9:40 PM, "Kasper Daniel Hansen" <kasperdanielhansen at gmail.com>
>>We have discussed this a couple of times. I routinely uses the reduce
>>followed by findOverlaps paradigm. As Malcolm says it feels wrong,
>>but from a practical point of view it is pretty fast, so I stopped
>>worrying about it. I only think there is a reason to do this, if it
>>is substantially faster.
>>On Wed, Mar 14, 2012 at 3:46 PM, Cook, Malcolm <MEC at stowers.org> wrote:
>>> Chiming in....
>>> on a similar note....
>>> A version of `disjoin` which returns a Hits/RangesMapping additional to
>>>the GRanges result would be most useful and probably not require much
>>>additional effort (assuming `disjoin` computes this internally)
>>> Of course, it is easy to live without since I can just perform the
>>>findOverlaps myself after the disjoin.... it just "feels wrong" (tm)
>>>> -----Original Message-----
>>>> From: bioc-devel-bounces at r-project.org [mailto:bioc-devel-bounces at r-
>>>> project.org] On Behalf Of Hahne, Florian
>>>> Sent: Wednesday, March 14, 2012 2:22 PM
>>>> To: bioc-devel at r-project.org
>>>> Subject: [Bioc-devel] mapping between original and reduced ranges
>>>> This bounced before, guess the mailing list does not like HTML mails.
>>>> one more try:
>>>> I had the following offline discussion with Michael about how one could
>>>> retain a mapping of the ranges in a GRanges object before and after
>>>> reduce. He suggested to take it to the list. Is that something that
>>>> be added to GenomicRanges/IRanges?
>>>> I have a slightly tricky application for which I need to reduce a
>>>> object, but I would like to be able to process some of the original
>>>> elementMetadata of the merged ranges later. The only way I was able to
>>>> figure out which of the original ranges correspond to the merged ranges
>>>> was to perform a findOverlaps operation, but of course that is rather
>>>> costly. Is there a way to get the merge information out of the original
>>>> reduce call?
>>>> Here is a brief example:
>>>> gr <- GRanges(seqnames="chr1", ranges=IRanges(start=c(1,6,12,24,27),
>>>> width=5), foo=1:5, bar=letters[1:5])
>>>> gr2 <- reduce(gr, min.gapwidth=1)
>>>> ind <- queryHits(findOverlaps(gr2, gr))
>>>> split(values(gr), ind)
>>>> Unfortunately, this is the idiom. I could see an improvement where
>>>> or a similarly named function would return a Hits object (in addition
>>>> the actual reduce result) that would indicate the mapping between the
>>>> input and reduced ranges. The RangesMapping structure would be really
>>>> close to what we would need.
>>>> Bioc-devel at r-project.org mailing list
>>> Bioc-devel at r-project.org mailing list
More information about the Bioc-devel