[Bioc-devel] mapping between original and reduced ranges

Hahne, Florian florian.hahne at novartis.com
Thu Mar 15 10:02:17 CET 2012


Hi all,
It is true that this is not terribly slow when you deal with fairly large
range objects:

foo <- GRanges(seqnames=sample(1:4, 1e6, TRUE),
ranges=IRanges(start=as.integer(runif(min=1, max=1e7, n=1e6)), width=50))
system.time(bar <- reduce(foo))
   user  system elapsed
  0.918   0.174   1.091

system.time(foobar <- findOverlaps(foo, bar))
   user  system elapsed
  2.051   0.402   2.453


However the whole process does take about 3x the time of just the reduce
operation, and in my use case I want this to happen interactively, where
waiting 3 seconds compared to 1 makes a huge difference...

I wouldn't push this high up on the development agenda, but it seems to be
something that is already 95% existing and could easily be added. But
maybe I am wrong...

Florian




Florian Hahne
Novartis Institute For Biomedical Research
Translational Sciences / Preclinical Safety / PCS Informatics
Expert Data Integration and Modeling Bioinformatics
CHBS, WKL-135.2.26
Novartis Institute For Biomedical Research, Werk Klybeck
Klybeckstrasse 141
CH-4057 Basel
Switzerland
Phone: +41 61 6967127
Email : florian.hahne at novartis.com







On 3/14/12 9:40 PM, "Kasper Daniel Hansen" <kasperdanielhansen at gmail.com>
wrote:

>We have discussed this a couple of times.  I routinely uses the reduce
>followed by findOverlaps paradigm.  As Malcolm says it feels wrong,
>but from a practical point of view it is pretty fast, so I stopped
>worrying about it.  I only think there is a reason to do this, if it
>is substantially faster.
>
>Kasper
>
>On Wed, Mar 14, 2012 at 3:46 PM, Cook, Malcolm <MEC at stowers.org> wrote:
>> Chiming in....
>>
>> on a similar note....
>>
>> A version of `disjoin` which returns a Hits/RangesMapping additional to
>>the GRanges result would be most useful  and probably not require much
>>additional effort (assuming `disjoin` computes this internally)
>>
>> Of course, it is easy to live without since I can just perform the
>>findOverlaps myself after the disjoin.... it just "feels wrong" (tm)
>>
>> Ahoy!
>>
>> ~Malcolm
>>
>>
>>> -----Original Message-----
>>> From: bioc-devel-bounces at r-project.org [mailto:bioc-devel-bounces at r-
>>> project.org] On Behalf Of Hahne, Florian
>>> Sent: Wednesday, March 14, 2012 2:22 PM
>>> To: bioc-devel at r-project.org
>>> Subject: [Bioc-devel] mapping between original and reduced ranges
>>>
>>> This bounced before, guess the mailing list does not like HTML mails.
>>>So
>>> one more try:
>>>
>>> I had the following offline discussion with Michael about how one could
>>> retain a mapping of the ranges in a GRanges object before and after
>>> reduce. He suggested to take it to the list. Is that something that
>>>could
>>> be added to GenomicRanges/IRanges?
>>> Florian
>>>
>>> I have a slightly tricky application for which I need to reduce a
>>>GRanges
>>> object, but I would like to be able to process some of the original
>>> elementMetadata of the merged ranges later. The only way I was able to
>>> figure out which of the original ranges correspond to the merged ranges
>>> was to perform a findOverlaps operation, but of course that is rather
>>> costly. Is there a way to get the merge information out of the original
>>> reduce call?
>>> Here is a brief example:
>>>
>>> gr <- GRanges(seqnames="chr1", ranges=IRanges(start=c(1,6,12,24,27),
>>> width=5), foo=1:5, bar=letters[1:5])
>>> gr2 <- reduce(gr, min.gapwidth=1)
>>> ind <- queryHits(findOverlaps(gr2, gr))
>>> split(values(gr), ind)
>>>
>>>
>>> Unfortunately, this is the idiom. I could see an improvement where
>>>reduce
>>> or a similarly named function would return a Hits object (in addition
>>>to
>>> the actual reduce result) that would indicate the mapping between the
>>> input and reduced ranges. The RangesMapping structure would be really
>>> close to what we would need.
>>>
>>> Michael
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list