[Bioc-devel] mapping between original and reduced ranges
Cook, Malcolm
MEC at stowers.org
Thu Mar 15 20:45:13 CET 2012
Hi Herve,
I've not used attributes to return values before.
I guess it would work, and I won't object further if you do it this way, but, since you asked
Again, it "feels wrong" in violating functional
I suspect there may be issues with memory management. When does the attribute get gc-ed? When the object does? If so, then, retaining the attribute in memory when not needed _could_ be a burden, no?
Back in my lisp days, this is when I would use `values` and `multiple-value-bind` (and friends) when I wanted a function to (optionally) return multiple values.
But this is R.
Would you consider returning instead a list of values, keyed by `value` and `hits`, but only when with.hits
BTW: with.inframe.attrib is documented as 'For internal use'. What does it return in the attr?
Thanks for listening!
~Malcolm
> -----Original Message-----
> From: bioc-devel-bounces at r-project.org [mailto:bioc-devel-bounces at r-
> project.org] On Behalf Of Hervé Pagès
> Sent: Thursday, March 15, 2012 1:55 PM
> To: Kasper Daniel Hansen
> Cc: bioc-devel at r-project.org
> Subject: Re: [Bioc-devel] mapping between original and reduced ranges
>
> Hi reducers,
>
> I agree it "feels wrong" to use findOverlaps() to extract the mapping
> from original to reduced ranges. Even if it can be computed very easily
> with:
>
> findOverlaps(gr, reduce(gr), select="first")
>
> (Note that using 'queryHits(findOverlaps(reduce(gr), gr))' only produces
> the correct result if 'gr' is already sorted by increasing order.)
>
> I think it would be easy for reduce() internal code to produce this
> mapping. The question is: how do we give it back to the user?
>
> Is it OK to use an attribute for this? reduce() already uses this
> for returning some extra information about the reduction:
>
> > ir
> IRanges of length 5
> start end width
> [1] 1 5 5
> [2] 6 10 5
> [3] 12 16 5
> [4] 24 28 5
> [5] 27 31 5
> > ir2 <- reduce(ir, with.inframe.attrib=TRUE)
> > ir2
> IRanges of length 3
> start end width
> [1] 1 10 10
> [2] 12 16 5
> [3] 24 31 8
> > attr(ir2, "inframe")
> IRanges of length 5
> start end width
> [1] 1 5 5
> [2] 6 10 5
> [3] 11 15 5
> [4] 16 20 5
> [5] 19 23 5
>
> We could to the same thing for the mapping from original to reduced
> ranges with e.g. an argument called 'with.mapping.attrib'.
> Would that work?
>
> Cheers,
> H.
>
>
> On 03/15/2012 05:44 AM, Kasper Daniel Hansen wrote:
> > So the key question is to what extent keeping track of where the
> > ranges comes from would slow down the reduce operation. I am not
> > familiar enough with the algorithm to know this, but given how fast
> > IRanges is in general, I am not one for guessing on this.
> >
> > I agree with Florian that this is a very typical use case.
> >
> > Kasper
> >
> > On Thu, Mar 15, 2012 at 5:02 AM, Hahne, Florian
> > <florian.hahne at novartis.com> wrote:
> >> Hi all,
> >> It is true that this is not terribly slow when you deal with fairly large
> >> range objects:
> >>
> >> foo<- GRanges(seqnames=sample(1:4, 1e6, TRUE),
> >> ranges=IRanges(start=as.integer(runif(min=1, max=1e7, n=1e6)),
> width=50))
> >> system.time(bar<- reduce(foo))
> >> user system elapsed
> >> 0.918 0.174 1.091
> >>
> >> system.time(foobar<- findOverlaps(foo, bar))
> >> user system elapsed
> >> 2.051 0.402 2.453
> >>
> >>
> >> However the whole process does take about 3x the time of just the
> reduce
> >> operation, and in my use case I want this to happen interactively, where
> >> waiting 3 seconds compared to 1 makes a huge difference...
> >>
> >> I wouldn't push this high up on the development agenda, but it seems to
> be
> >> something that is already 95% existing and could easily be added. But
> >> maybe I am wrong...
> >>
> >> Florian
> >>
> >>
> >>
> >>
> >> Florian Hahne
> >> Novartis Institute For Biomedical Research
> >> Translational Sciences / Preclinical Safety / PCS Informatics
> >> Expert Data Integration and Modeling Bioinformatics
> >> CHBS, WKL-135.2.26
> >> Novartis Institute For Biomedical Research, Werk Klybeck
> >> Klybeckstrasse 141
> >> CH-4057 Basel
> >> Switzerland
> >> Phone: +41 61 6967127
> >> Email : florian.hahne at novartis.com
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On 3/14/12 9:40 PM, "Kasper Daniel
> Hansen"<kasperdanielhansen at gmail.com>
> >> wrote:
> >>
> >>> We have discussed this a couple of times. I routinely uses the reduce
> >>> followed by findOverlaps paradigm. As Malcolm says it feels wrong,
> >>> but from a practical point of view it is pretty fast, so I stopped
> >>> worrying about it. I only think there is a reason to do this, if it
> >>> is substantially faster.
> >>>
> >>> Kasper
> >>>
> >>> On Wed, Mar 14, 2012 at 3:46 PM, Cook, Malcolm<MEC at stowers.org>
> wrote:
> >>>> Chiming in....
> >>>>
> >>>> on a similar note....
> >>>>
> >>>> A version of `disjoin` which returns a Hits/RangesMapping additional to
> >>>> the GRanges result would be most useful and probably not require
> much
> >>>> additional effort (assuming `disjoin` computes this internally)
> >>>>
> >>>> Of course, it is easy to live without since I can just perform the
> >>>> findOverlaps myself after the disjoin.... it just "feels wrong" (tm)
> >>>>
> >>>> Ahoy!
> >>>>
> >>>> ~Malcolm
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: bioc-devel-bounces at r-project.org [mailto:bioc-devel-
> bounces at r-
> >>>>> project.org] On Behalf Of Hahne, Florian
> >>>>> Sent: Wednesday, March 14, 2012 2:22 PM
> >>>>> To: bioc-devel at r-project.org
> >>>>> Subject: [Bioc-devel] mapping between original and reduced ranges
> >>>>>
> >>>>> This bounced before, guess the mailing list does not like HTML mails.
> >>>>> So
> >>>>> one more try:
> >>>>>
> >>>>> I had the following offline discussion with Michael about how one
> could
> >>>>> retain a mapping of the ranges in a GRanges object before and after
> >>>>> reduce. He suggested to take it to the list. Is that something that
> >>>>> could
> >>>>> be added to GenomicRanges/IRanges?
> >>>>> Florian
> >>>>>
> >>>>> I have a slightly tricky application for which I need to reduce a
> >>>>> GRanges
> >>>>> object, but I would like to be able to process some of the original
> >>>>> elementMetadata of the merged ranges later. The only way I was
> able to
> >>>>> figure out which of the original ranges correspond to the merged
> ranges
> >>>>> was to perform a findOverlaps operation, but of course that is rather
> >>>>> costly. Is there a way to get the merge information out of the original
> >>>>> reduce call?
> >>>>> Here is a brief example:
> >>>>>
> >>>>> gr<- GRanges(seqnames="chr1",
> ranges=IRanges(start=c(1,6,12,24,27),
> >>>>> width=5), foo=1:5, bar=letters[1:5])
> >>>>> gr2<- reduce(gr, min.gapwidth=1)
> >>>>> ind<- queryHits(findOverlaps(gr2, gr))
> >>>>> split(values(gr), ind)
> >>>>>
> >>>>>
> >>>>> Unfortunately, this is the idiom. I could see an improvement where
> >>>>> reduce
> >>>>> or a similarly named function would return a Hits object (in addition
> >>>>> to
> >>>>> the actual reduce result) that would indicate the mapping between
> the
> >>>>> input and reduced ranges. The RangesMapping structure would be
> really
> >>>>> close to what we would need.
> >>>>>
> >>>>> Michael
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioc-devel at r-project.org mailing list
> >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>>>
> >>>> _______________________________________________
> >>>> Bioc-devel at r-project.org mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> Phone: (206) 667-5791
> Fax: (206) 667-1319
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
More information about the Bioc-devel
mailing list