[Bioc-devel] mapping between original and reduced ranges
Hahne, Florian
florian.hahne at novartis.com
Thu Mar 15 10:02:17 CET 2012
Hi all,
It is true that this is not terribly slow when you deal with fairly large
range objects:
foo <- GRanges(seqnames=sample(1:4, 1e6, TRUE),
ranges=IRanges(start=as.integer(runif(min=1, max=1e7, n=1e6)), width=50))
system.time(bar <- reduce(foo))
user system elapsed
0.918 0.174 1.091
system.time(foobar <- findOverlaps(foo, bar))
user system elapsed
2.051 0.402 2.453
However the whole process does take about 3x the time of just the reduce
operation, and in my use case I want this to happen interactively, where
waiting 3 seconds compared to 1 makes a huge difference...
I wouldn't push this high up on the development agenda, but it seems to be
something that is already 95% existing and could easily be added. But
maybe I am wrong...
Florian
Florian Hahne
Novartis Institute For Biomedical Research
Translational Sciences / Preclinical Safety / PCS Informatics
Expert Data Integration and Modeling Bioinformatics
CHBS, WKL-135.2.26
Novartis Institute For Biomedical Research, Werk Klybeck
Klybeckstrasse 141
CH-4057 Basel
Switzerland
Phone: +41 61 6967127
Email : florian.hahne at novartis.com
On 3/14/12 9:40 PM, "Kasper Daniel Hansen" <kasperdanielhansen at gmail.com>
wrote:
>We have discussed this a couple of times. I routinely uses the reduce
>followed by findOverlaps paradigm. As Malcolm says it feels wrong,
>but from a practical point of view it is pretty fast, so I stopped
>worrying about it. I only think there is a reason to do this, if it
>is substantially faster.
>
>Kasper
>
>On Wed, Mar 14, 2012 at 3:46 PM, Cook, Malcolm <MEC at stowers.org> wrote:
>> Chiming in....
>>
>> on a similar note....
>>
>> A version of `disjoin` which returns a Hits/RangesMapping additional to
>>the GRanges result would be most useful and probably not require much
>>additional effort (assuming `disjoin` computes this internally)
>>
>> Of course, it is easy to live without since I can just perform the
>>findOverlaps myself after the disjoin.... it just "feels wrong" (tm)
>>
>> Ahoy!
>>
>> ~Malcolm
>>
>>
>>> -----Original Message-----
>>> From: bioc-devel-bounces at r-project.org [mailto:bioc-devel-bounces at r-
>>> project.org] On Behalf Of Hahne, Florian
>>> Sent: Wednesday, March 14, 2012 2:22 PM
>>> To: bioc-devel at r-project.org
>>> Subject: [Bioc-devel] mapping between original and reduced ranges
>>>
>>> This bounced before, guess the mailing list does not like HTML mails.
>>>So
>>> one more try:
>>>
>>> I had the following offline discussion with Michael about how one could
>>> retain a mapping of the ranges in a GRanges object before and after
>>> reduce. He suggested to take it to the list. Is that something that
>>>could
>>> be added to GenomicRanges/IRanges?
>>> Florian
>>>
>>> I have a slightly tricky application for which I need to reduce a
>>>GRanges
>>> object, but I would like to be able to process some of the original
>>> elementMetadata of the merged ranges later. The only way I was able to
>>> figure out which of the original ranges correspond to the merged ranges
>>> was to perform a findOverlaps operation, but of course that is rather
>>> costly. Is there a way to get the merge information out of the original
>>> reduce call?
>>> Here is a brief example:
>>>
>>> gr <- GRanges(seqnames="chr1", ranges=IRanges(start=c(1,6,12,24,27),
>>> width=5), foo=1:5, bar=letters[1:5])
>>> gr2 <- reduce(gr, min.gapwidth=1)
>>> ind <- queryHits(findOverlaps(gr2, gr))
>>> split(values(gr), ind)
>>>
>>>
>>> Unfortunately, this is the idiom. I could see an improvement where
>>>reduce
>>> or a similarly named function would return a Hits object (in addition
>>>to
>>> the actual reduce result) that would indicate the mapping between the
>>> input and reduced ranges. The RangesMapping structure would be really
>>> close to what we would need.
>>>
>>> Michael
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
More information about the Bioc-devel
mailing list