[Bioc-devel] mapping between original and reduced ranges

Cook, Malcolm MEC at stowers.org
Thu Mar 15 20:45:13 CET 2012


Hi Herve,

I've not used attributes to return values before. 

I guess it would work, and I won't object further if you do it this way, but, since you asked

Again, it "feels wrong" in violating functional

I suspect there may be issues with memory management.  When does the attribute get gc-ed?  When the object does?  If so, then, retaining the attribute in memory when not needed _could_ be a burden, no?

Back in my lisp days, this is when I would use `values` and `multiple-value-bind` (and friends) when I wanted a function to (optionally) return multiple values.  

But this is R.

Would you consider returning instead a list of values, keyed by `value` and `hits`, but only when with.hits 

BTW: with.inframe.attrib is documented as 'For internal use'.  What does it return in the attr?

Thanks for listening!

~Malcolm


> -----Original Message-----
> From: bioc-devel-bounces at r-project.org [mailto:bioc-devel-bounces at r-
> project.org] On Behalf Of Hervé Pagès
> Sent: Thursday, March 15, 2012 1:55 PM
> To: Kasper Daniel Hansen
> Cc: bioc-devel at r-project.org
> Subject: Re: [Bioc-devel] mapping between original and reduced ranges
> 
> Hi reducers,
> 
> I agree it "feels wrong" to use findOverlaps() to extract the mapping
> from original to reduced ranges. Even if it can be computed very easily
> with:
> 
>    findOverlaps(gr, reduce(gr), select="first")
> 
> (Note that using 'queryHits(findOverlaps(reduce(gr), gr))' only produces
> the correct result if 'gr' is already sorted by increasing order.)
> 
> I think it would be easy for reduce() internal code to produce this
> mapping. The question is: how do we give it back to the user?
> 
> Is it OK to use an attribute for this? reduce() already uses this
> for returning some extra information about the reduction:
> 
>    > ir
>    IRanges of length 5
>        start end width
>    [1]     1   5     5
>    [2]     6  10     5
>    [3]    12  16     5
>    [4]    24  28     5
>    [5]    27  31     5
>    > ir2 <- reduce(ir, with.inframe.attrib=TRUE)
>    > ir2
>    IRanges of length 3
>        start end width
>    [1]     1  10    10
>    [2]    12  16     5
>    [3]    24  31     8
>    > attr(ir2, "inframe")
>    IRanges of length 5
>        start end width
>    [1]     1   5     5
>    [2]     6  10     5
>    [3]    11  15     5
>    [4]    16  20     5
>    [5]    19  23     5
> 
> We could to the same thing for the mapping from original to reduced
> ranges with e.g. an argument called 'with.mapping.attrib'.
> Would that work?
> 
> Cheers,
> H.
> 
> 
> On 03/15/2012 05:44 AM, Kasper Daniel Hansen wrote:
> > So the key question is to what extent keeping track of where the
> > ranges comes from would slow down the reduce operation.  I am not
> > familiar enough with the algorithm to know this, but given how fast
> > IRanges is in general, I am not one for guessing on this.
> >
> > I agree with Florian that this is a very typical use case.
> >
> > Kasper
> >
> > On Thu, Mar 15, 2012 at 5:02 AM, Hahne, Florian
> > <florian.hahne at novartis.com>  wrote:
> >> Hi all,
> >> It is true that this is not terribly slow when you deal with fairly large
> >> range objects:
> >>
> >> foo<- GRanges(seqnames=sample(1:4, 1e6, TRUE),
> >> ranges=IRanges(start=as.integer(runif(min=1, max=1e7, n=1e6)),
> width=50))
> >> system.time(bar<- reduce(foo))
> >>    user  system elapsed
> >>   0.918   0.174   1.091
> >>
> >> system.time(foobar<- findOverlaps(foo, bar))
> >>    user  system elapsed
> >>   2.051   0.402   2.453
> >>
> >>
> >> However the whole process does take about 3x the time of just the
> reduce
> >> operation, and in my use case I want this to happen interactively, where
> >> waiting 3 seconds compared to 1 makes a huge difference...
> >>
> >> I wouldn't push this high up on the development agenda, but it seems to
> be
> >> something that is already 95% existing and could easily be added. But
> >> maybe I am wrong...
> >>
> >> Florian
> >>
> >>
> >>
> >>
> >> Florian Hahne
> >> Novartis Institute For Biomedical Research
> >> Translational Sciences / Preclinical Safety / PCS Informatics
> >> Expert Data Integration and Modeling Bioinformatics
> >> CHBS, WKL-135.2.26
> >> Novartis Institute For Biomedical Research, Werk Klybeck
> >> Klybeckstrasse 141
> >> CH-4057 Basel
> >> Switzerland
> >> Phone: +41 61 6967127
> >> Email : florian.hahne at novartis.com
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On 3/14/12 9:40 PM, "Kasper Daniel
> Hansen"<kasperdanielhansen at gmail.com>
> >> wrote:
> >>
> >>> We have discussed this a couple of times.  I routinely uses the reduce
> >>> followed by findOverlaps paradigm.  As Malcolm says it feels wrong,
> >>> but from a practical point of view it is pretty fast, so I stopped
> >>> worrying about it.  I only think there is a reason to do this, if it
> >>> is substantially faster.
> >>>
> >>> Kasper
> >>>
> >>> On Wed, Mar 14, 2012 at 3:46 PM, Cook, Malcolm<MEC at stowers.org>
> wrote:
> >>>> Chiming in....
> >>>>
> >>>> on a similar note....
> >>>>
> >>>> A version of `disjoin` which returns a Hits/RangesMapping additional to
> >>>> the GRanges result would be most useful  and probably not require
> much
> >>>> additional effort (assuming `disjoin` computes this internally)
> >>>>
> >>>> Of course, it is easy to live without since I can just perform the
> >>>> findOverlaps myself after the disjoin.... it just "feels wrong" (tm)
> >>>>
> >>>> Ahoy!
> >>>>
> >>>> ~Malcolm
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: bioc-devel-bounces at r-project.org [mailto:bioc-devel-
> bounces at r-
> >>>>> project.org] On Behalf Of Hahne, Florian
> >>>>> Sent: Wednesday, March 14, 2012 2:22 PM
> >>>>> To: bioc-devel at r-project.org
> >>>>> Subject: [Bioc-devel] mapping between original and reduced ranges
> >>>>>
> >>>>> This bounced before, guess the mailing list does not like HTML mails.
> >>>>> So
> >>>>> one more try:
> >>>>>
> >>>>> I had the following offline discussion with Michael about how one
> could
> >>>>> retain a mapping of the ranges in a GRanges object before and after
> >>>>> reduce. He suggested to take it to the list. Is that something that
> >>>>> could
> >>>>> be added to GenomicRanges/IRanges?
> >>>>> Florian
> >>>>>
> >>>>> I have a slightly tricky application for which I need to reduce a
> >>>>> GRanges
> >>>>> object, but I would like to be able to process some of the original
> >>>>> elementMetadata of the merged ranges later. The only way I was
> able to
> >>>>> figure out which of the original ranges correspond to the merged
> ranges
> >>>>> was to perform a findOverlaps operation, but of course that is rather
> >>>>> costly. Is there a way to get the merge information out of the original
> >>>>> reduce call?
> >>>>> Here is a brief example:
> >>>>>
> >>>>> gr<- GRanges(seqnames="chr1",
> ranges=IRanges(start=c(1,6,12,24,27),
> >>>>> width=5), foo=1:5, bar=letters[1:5])
> >>>>> gr2<- reduce(gr, min.gapwidth=1)
> >>>>> ind<- queryHits(findOverlaps(gr2, gr))
> >>>>> split(values(gr), ind)
> >>>>>
> >>>>>
> >>>>> Unfortunately, this is the idiom. I could see an improvement where
> >>>>> reduce
> >>>>> or a similarly named function would return a Hits object (in addition
> >>>>> to
> >>>>> the actual reduce result) that would indicate the mapping between
> the
> >>>>> input and reduced ranges. The RangesMapping structure would be
> really
> >>>>> close to what we would need.
> >>>>>
> >>>>> Michael
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioc-devel at r-project.org mailing list
> >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>>>
> >>>> _______________________________________________
> >>>> Bioc-devel at r-project.org mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 
> 
> --
> Hervé Pagès
> 
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
> 
> E-mail: hpages at fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
> 
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list