[Bioc-devel] mapping between original and reduced ranges

Kasper Daniel Hansen kasperdanielhansen at gmail.com
Thu Mar 15 22:40:24 CET 2012


I'll vote against the attribute solution and for a solution where the
type of return object gets changed, for example into a list.

Kasper

2012/3/15 Hervé Pagès <hpages at fhcrc.org>:
> On 03/15/2012 12:45 PM, Cook, Malcolm wrote:
>>
>> Hi Herve,
>>
>> I've not used attributes to return values before.
>>
>> I guess it would work, and I won't object further if you do it this way,
>> but, since you asked
>>
>> Again, it "feels wrong" in violating functional
>>
>> I suspect there may be issues with memory management.  When does the
>> attribute get gc-ed?  When the object does?  If so, then, retaining the
>> attribute in memory when not needed _could_ be a burden, no?
>>
>> Back in my lisp days, this is when I would use `values` and
>> `multiple-value-bind` (and friends) when I wanted a function to (optionally)
>> return multiple values.
>>
>> But this is R.
>>
>> Would you consider returning instead a list of values, keyed by `value`
>> and `hits`, but only when with.hits
>>
>> BTW: with.inframe.attrib is documented as 'For internal use'.  What does
>> it return in the attr?
>
>
> AFAIK, it's only supported by the "reduce" methods for IRanges objects.
>
> The "inframe" attribute contains an IRanges object of the same length as
> the input. For each range in the input it tells you the position of
> that range with respect to the "frame" i.e. the space obtained by
> pasting together the ranges in the reduce object:
>
>
>  > ir
>  IRanges of length 5
>      start end width
>  [1]    24  28     5
>  [2]    27  31     5
>  [3]     1   5     5
>  [4]     6  10     5
>  [5]    12  16     5
>
>  > ir2 <- reduce(ir, with.inframe.attrib=TRUE)
>  > ir2
>  IRanges of length 3
>      start end width
>  [1]     1  10    10
>  [2]    12  16     5
>  [3]    24  31     8
>  > attr(ir2, "inframe")
>  IRanges of length 5
>      start end width
>  [1]    16  20     5
>  [2]    19  23     5
>  [3]     1   5     5
>  [4]     6  10     5
>  [5]    11  15     5
>
>
>                  1    1    2    2    3
>         1...5....0....5....0....5....0. <- standard coordinate system
>  ir[1]                         xxxxx
>  ir[2]                            xxxxx
>  ir[3]  xxxxx
>  ir[4]       xxxxx
>  ir[5]             xxxxx
>
>  ir2:   xxxxxxxxxx xxxxx       xxxxxxxx
>
>         1...5....1 ....1       ....2... <- "frame" coordinate system
>                  0     5           0
>
> I'll document this.
>
> H.
>
>
>>
>> Thanks for listening!
>>
>> ~Malcolm
>>
>>
>>> -----Original Message-----
>>> From: bioc-devel-bounces at r-project.org [mailto:bioc-devel-bounces at r-
>>> project.org] On Behalf Of Hervé Pagès
>>> Sent: Thursday, March 15, 2012 1:55 PM
>>> To: Kasper Daniel Hansen
>>> Cc: bioc-devel at r-project.org
>>> Subject: Re: [Bioc-devel] mapping between original and reduced ranges
>>>
>>> Hi reducers,
>>>
>>> I agree it "feels wrong" to use findOverlaps() to extract the mapping
>>> from original to reduced ranges. Even if it can be computed very easily
>>> with:
>>>
>>>    findOverlaps(gr, reduce(gr), select="first")
>>>
>>> (Note that using 'queryHits(findOverlaps(reduce(gr), gr))' only produces
>>> the correct result if 'gr' is already sorted by increasing order.)
>>>
>>> I think it would be easy for reduce() internal code to produce this
>>> mapping. The question is: how do we give it back to the user?
>>>
>>> Is it OK to use an attribute for this? reduce() already uses this
>>> for returning some extra information about the reduction:
>>>
>>>    >  ir
>>>    IRanges of length 5
>>>        start end width
>>>    [1]     1   5     5
>>>    [2]     6  10     5
>>>    [3]    12  16     5
>>>    [4]    24  28     5
>>>    [5]    27  31     5
>>>    >  ir2<- reduce(ir, with.inframe.attrib=TRUE)
>>>    >  ir2
>>>    IRanges of length 3
>>>        start end width
>>>    [1]     1  10    10
>>>    [2]    12  16     5
>>>    [3]    24  31     8
>>>    >  attr(ir2, "inframe")
>>>    IRanges of length 5
>>>        start end width
>>>    [1]     1   5     5
>>>    [2]     6  10     5
>>>    [3]    11  15     5
>>>    [4]    16  20     5
>>>    [5]    19  23     5
>>>
>>> We could to the same thing for the mapping from original to reduced
>>> ranges with e.g. an argument called 'with.mapping.attrib'.
>>> Would that work?
>>>
>>> Cheers,
>>> H.
>>>
>>>
>>> On 03/15/2012 05:44 AM, Kasper Daniel Hansen wrote:
>>>>
>>>> So the key question is to what extent keeping track of where the
>>>> ranges comes from would slow down the reduce operation.  I am not
>>>> familiar enough with the algorithm to know this, but given how fast
>>>> IRanges is in general, I am not one for guessing on this.
>>>>
>>>> I agree with Florian that this is a very typical use case.
>>>>
>>>> Kasper
>>>>
>>>> On Thu, Mar 15, 2012 at 5:02 AM, Hahne, Florian
>>>> <florian.hahne at novartis.com>   wrote:
>>>>>
>>>>> Hi all,
>>>>> It is true that this is not terribly slow when you deal with fairly
>>>>> large
>>>>> range objects:
>>>>>
>>>>> foo<- GRanges(seqnames=sample(1:4, 1e6, TRUE),
>>>>> ranges=IRanges(start=as.integer(runif(min=1, max=1e7, n=1e6)),
>>>
>>> width=50))
>>>>>
>>>>> system.time(bar<- reduce(foo))
>>>>>    user  system elapsed
>>>>>   0.918   0.174   1.091
>>>>>
>>>>> system.time(foobar<- findOverlaps(foo, bar))
>>>>>    user  system elapsed
>>>>>   2.051   0.402   2.453
>>>>>
>>>>>
>>>>> However the whole process does take about 3x the time of just the
>>>
>>> reduce
>>>>>
>>>>> operation, and in my use case I want this to happen interactively,
>>>>> where
>>>>> waiting 3 seconds compared to 1 makes a huge difference...
>>>>>
>>>>> I wouldn't push this high up on the development agenda, but it seems to
>>>
>>> be
>>>>>
>>>>> something that is already 95% existing and could easily be added. But
>>>>> maybe I am wrong...
>>>>>
>>>>> Florian
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Florian Hahne
>>>>> Novartis Institute For Biomedical Research
>>>>> Translational Sciences / Preclinical Safety / PCS Informatics
>>>>> Expert Data Integration and Modeling Bioinformatics
>>>>> CHBS, WKL-135.2.26
>>>>> Novartis Institute For Biomedical Research, Werk Klybeck
>>>>> Klybeckstrasse 141
>>>>> CH-4057 Basel
>>>>> Switzerland
>>>>> Phone: +41 61 6967127
>>>>> Email : florian.hahne at novartis.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 3/14/12 9:40 PM, "Kasper Daniel
>>>
>>> Hansen"<kasperdanielhansen at gmail.com>
>>>>>
>>>>> wrote:
>>>>>
>>>>>> We have discussed this a couple of times.  I routinely uses the reduce
>>>>>> followed by findOverlaps paradigm.  As Malcolm says it feels wrong,
>>>>>> but from a practical point of view it is pretty fast, so I stopped
>>>>>> worrying about it.  I only think there is a reason to do this, if it
>>>>>> is substantially faster.
>>>>>>
>>>>>> Kasper
>>>>>>
>>>>>> On Wed, Mar 14, 2012 at 3:46 PM, Cook, Malcolm<MEC at stowers.org>
>>>
>>> wrote:
>>>>>>>
>>>>>>> Chiming in....
>>>>>>>
>>>>>>> on a similar note....
>>>>>>>
>>>>>>> A version of `disjoin` which returns a Hits/RangesMapping additional
>>>>>>> to
>>>>>>> the GRanges result would be most useful  and probably not require
>>>
>>> much
>>>>>>>
>>>>>>> additional effort (assuming `disjoin` computes this internally)
>>>>>>>
>>>>>>> Of course, it is easy to live without since I can just perform the
>>>>>>> findOverlaps myself after the disjoin.... it just "feels wrong" (tm)
>>>>>>>
>>>>>>> Ahoy!
>>>>>>>
>>>>>>> ~Malcolm
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: bioc-devel-bounces at r-project.org [mailto:bioc-devel-
>>>
>>> bounces at r-
>>>>>>>>
>>>>>>>> project.org] On Behalf Of Hahne, Florian
>>>>>>>> Sent: Wednesday, March 14, 2012 2:22 PM
>>>>>>>> To: bioc-devel at r-project.org
>>>>>>>> Subject: [Bioc-devel] mapping between original and reduced ranges
>>>>>>>>
>>>>>>>> This bounced before, guess the mailing list does not like HTML
>>>>>>>> mails.
>>>>>>>> So
>>>>>>>> one more try:
>>>>>>>>
>>>>>>>> I had the following offline discussion with Michael about how one
>>>
>>> could
>>>>>>>>
>>>>>>>> retain a mapping of the ranges in a GRanges object before and after
>>>>>>>> reduce. He suggested to take it to the list. Is that something that
>>>>>>>> could
>>>>>>>> be added to GenomicRanges/IRanges?
>>>>>>>> Florian
>>>>>>>>
>>>>>>>> I have a slightly tricky application for which I need to reduce a
>>>>>>>> GRanges
>>>>>>>> object, but I would like to be able to process some of the original
>>>>>>>> elementMetadata of the merged ranges later. The only way I was
>>>
>>> able to
>>>>>>>>
>>>>>>>> figure out which of the original ranges correspond to the merged
>>>
>>> ranges
>>>>>>>>
>>>>>>>> was to perform a findOverlaps operation, but of course that is
>>>>>>>> rather
>>>>>>>> costly. Is there a way to get the merge information out of the
>>>>>>>> original
>>>>>>>> reduce call?
>>>>>>>> Here is a brief example:
>>>>>>>>
>>>>>>>> gr<- GRanges(seqnames="chr1",
>>>
>>> ranges=IRanges(start=c(1,6,12,24,27),
>>>>>>>>
>>>>>>>> width=5), foo=1:5, bar=letters[1:5])
>>>>>>>> gr2<- reduce(gr, min.gapwidth=1)
>>>>>>>> ind<- queryHits(findOverlaps(gr2, gr))
>>>>>>>> split(values(gr), ind)
>>>>>>>>
>>>>>>>>
>>>>>>>> Unfortunately, this is the idiom. I could see an improvement where
>>>>>>>> reduce
>>>>>>>> or a similarly named function would return a Hits object (in
>>>>>>>> addition
>>>>>>>> to
>>>>>>>> the actual reduce result) that would indicate the mapping between
>>>
>>> the
>>>>>>>>
>>>>>>>> input and reduced ranges. The RangesMapping structure would be
>>>
>>> really
>>>>>>>>
>>>>>>>> close to what we would need.
>>>>>>>>
>>>>>>>> Michael
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>>>
>>> --
>>> Hervé Pagès
>>>
>>> Program in Computational Biology
>>> Division of Public Health Sciences
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N, M1-B514
>>> P.O. Box 19024
>>> Seattle, WA 98109-1024
>>>
>>> E-mail: hpages at fhcrc.org
>>> Phone:  (206) 667-5791
>>> Fax:    (206) 667-1319
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319



More information about the Bioc-devel mailing list