[Bioc-devel] mapping between original and reduced ranges
Hahne, Florian
florian.hahne at novartis.com
Sat Mar 17 09:43:37 CET 2012
Hi Herve,
Fair enough. Let's put the mapping in elementMetadata and leave the
folding to the user. Realistically, to do something useful with a
reasonably large SplitDataFrameList one would probably need some C code
anyways. Unless you want to wait forever to lapply over potentially
millions of list items...
Thanks,
Florian
On 3/16/12 11:20 PM, "Hervé Pagès" <hpages at fhcrc.org> wrote:
>Hi Florian,
>
>On 03/16/2012 01:02 AM, Hahne, Florian wrote:
>> Hi Herve, the two are related but not similar issues:
>>
>>
>>
>> On 3/16/12 6:07 AM, "Hervé Pagès"<hpages at fhcrc.org> wrote:
>>
>>> On 03/15/2012 04:13 PM, Hahne, Florian wrote:
>>>> Would such a solution also allow to keep the original elementMetadata
>>>>in
>>>> the respective list representation?
>>>
>>> Let see if I understand your question correctly.
>>>
>>> IIUC Michael is proposing to store the "reverse mapping" (i.e. the
>>> mapping from the reduced ranges to the original ranges) in the
>>> elementMetadata slot of the reduced object. Note that, unlike the
>>> "direct mapping" (i.e. the mapping from the original ranges to the
>>> reduced ranges), which is a many-to-1 relationship, the "reverse
>>> mapping is 1-to-many, hence the need to use an IntegerList to represent
>>> it. The good thing about this IntegerList is that it has the length of
>>> the reduced object so it fits naturally in its elementMetadata slot:
>>>
>>> > ir
>>> IRanges of length 5
>>> start end width
>>> [1] 24 28 5
>>> [2] 27 31 5
>>> [3] 1 5 5
>>> [4] 6 10 5
>>> [5] 12 16 5
>>> > ir2<- reduce(ir)
>>> > ir2
>>> IRanges of length 3
>>> start end width
>>> [1] 1 10 10
>>> [2] 12 16 5
>>> [3] 24 31 8
>>> > hits<- findOverlaps(ir, ir2)
>>> > rmap<- IntegerList(split(queryHits(hits), subjectHits(hits)))
>>> > rmap
>>> CompressedIntegerList of length 3
>>> [["1"]] 3 4
>>> [["2"]] 5
>>> [["3"]] 1 2
>>>
>>> The "direct mapping" is a simpler object (just an integer vector) but
>>>it
>>> has the length of the original object so it cannot be stored in the
>>> elementMetadata of the reduced object. It has to go somewhere else
>>> (attribute or metadata). A good thing about storing the "reverse
>>> mapping" in the elementMetadata slot is that there is no risk of
>>> "getting in the way" (i.e. clash with other stuff in that slot)
>>> because reduce() drops the original elementMetadata anyway. It also
>>> plays nice with subsetting the reduced object and with the vectorized
>>> behaviour of the "reduce" method for GRangesList.
>>>
>>> The only small con I see is that, like in the use case you show in your
>>> original post, the user might actually need the direct mapping, not the
>>> reverse one. However it's not too hard to reverse a mapping. There is
>>> actually a function in Biobase for doing this on a regular list:
>>>
>>> > as.integer(reverseSplit(as.list(rmap)))
>>> [1] 3 3 1 1 2
>>> > findOverlaps(ir, reduce(ir), select="first")
>>> [1] 3 3 1 1 2
>>>
>>> There is also revmap() in AnnotationDbi that does the same on Bimap
>>> objects. It also works on a regular list:
>>>
>>> > as.integer(revmap(as.list(rmap)))
>>> [1] 3 3 1 1 2
>>>
>>> We could move the revmap() generic from AnnotationDbi to IRanges (or
>>> to BiocGenerics) and add a method for IntegerList objects.
>>>
>>> I think this is actually my preferred solution so far.
>>
>> This sounds perfect to me and absolutely handles the mapping problem
>
>Great! Seems like we have an agreement :-) Now I find that putting this
>"reverse mapping" in the elementMetadata of the returned object is the
>right thing to do. The same can be done with disjoin() where the
>reverse mapping is many-to-many. I don't even see the need for an
>extra argument to reduce() or disjoin() anymore in order to let the
>user choose whether s/he wants the reverse mapping or not. The overhead
>should be minimal (the reverse mapping is light and easy to compute).
>
>If nobody objects, I'll put this on my list of things to do for BioC
>2.11.
>
>>
>>>
>>> Back to your question:
>>>
>>> Would such a solution also allow to keep the original
>>>elementMetadata
>>> in the respective list representation?
>>>
>>> Something like "folding" the original elementMetadata in a way that
>>> makes it fit in the elementMetadata of the reduced object? The original
>>> elementMetadata (a DataFrame) could actually be split in a
>>> SplitDataFrameList (i.e. conceptually a list of DataFrame's that all
>>> have the same columns), and this SplitDataFrameList stored in the
>>> elementMetadata of the reduced object. It would be stored as a single
>>> column though since elementMetadata must be DataFrame, it cannot
>>> be a SplitDataFrameList. That solution feels "heavy" to me i.e. it
>>> involves complex data structures that are not so easy to manipulate.
>>> I still like Michael's proposal better, it's much lighter.
>>
>> The reason I was asking this is that with the mapping I would actually
>> split the old elementMetadata and do some processing on it. So in
>>addition
>> to being able to add the mapping I thought it would be kind of useful to
>> retain the original metadata in a "folded" structure similar to the
>> IntegerList approach Michael is proposing. Essentially one would
>>collapse
>> each of the existing elementMetadata columns in a list representation.
>>At
>> first I found it a bit surprising that all my elementMetadata was gone
>> after reduce, but after some thinking it actually makes sense, since the
>> original ranges are gone, too, and having a reduced/folded
>>elementMetadata
>> slot in the default output would be very confusing. I was just thinking
>>to
>> have this as an option, mainly because I can see this as a general use
>> case, and It would make subsequent operations a little more coherent
>> compared to having a folded elementMetadata representation and the
>>reduced
>> ranges in two separate objects.
>
>Note that the user will be able to easily do this folding with:
>
> gr2 <- reduce(gr)
> map <- unlist(revmap(elementMetadata(gr2)$rmap)) # reverse
> split(elementMetadata(gr), map) # returns a SplitDataFrameList
>
>The man page for reduce() will show how to do this. I'm a little bit
>hesitant though to add this as an option to reduce() because of what
>I said earlier i.e. the SplitDataFrameList would then need to be stored
>in elementMetadata(gr2) as a *single* column with something like:
>
> elementMetadata(gr2)$foldedOriginalMetadata
>
>... which is a little weird, and only experts would actually know how
>to access specific data in this folded thing. But if people want this,
>no problem, I'll add it to my list too.
>
>Cheers,
>H.
>
>
>>
>>>
>>> Cheers,
>>> H.
>>>
>>>> I assume that creates about the same
>>>> overhead as keeping the index?
>>>>
>>>>
>>>> Florian Hahne
>>>> Novartis Institute For Biomedical Research
>>>> Translational Sciences / Preclinical Safety / PCS Informatics
>>>> Expert Data Integration and Modeling Bioinformatics
>>>> CHBS, WKL-135.2.26
>>>> Novartis Institute For Biomedical Research, Werk Klybeck
>>>> Klybeckstrasse 141
>>>> CH-4057 Basel
>>>> Switzerland
>>>> Phone: +41 61 6967127
>>>> Email : florian.hahne at novartis.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 3/15/12 11:58 PM, "Michael Lawrence"<lawrence.michael at gene.com>
>>>> wrote:
>>>>
>>>>> I would be in favor of either the attribute or metadata solution. I
>>>>> could
>>>>> see having an IntegerList element in the element metadata that
>>>>> indicates
>>>>> the original ranges that were reduced into the returned range, or a
>>>>> Hits
>>>>> object in the top-level metadata. A plus and minus of the metadata
>>>>> approach
>>>>> is that it is more familiar to the users than hiding stuff in
>>>>> attributes,
>>>>> which is pretty low-level. However, using the metadata will increase
>>>>> the
>>>>> probabilty of "getting in the way". The user does need to explicitly
>>>>> request it though.
>>>>>
>>>>> Michael
>>>>>
>>>>> On Thu, Mar 15, 2012 at 3:26 PM, Hervé Pagès<hpages at fhcrc.org>
>>>>>wrote:
>>>>>
>>>>>> On 03/15/2012 02:40 PM, Kasper Daniel Hansen wrote:
>>>>>>
>>>>>>> I'll vote against the attribute solution and for a solution where
>>>>>>>the
>>>>>>> type of return object gets changed, for example into a list.
>>>>>>>
>>>>>>
>>>>>> Thanks for voting!
>>>>>>
>>>>>> Problem with this is how you handle 'with.mapping=TRUE' when the
>>>>>>input
>>>>>> is GRangesList. Do you return
>>>>>>
>>>>>> (1) a list of the same length as the input 'x', where the i-th
>>>>>> top-level element is itself the 2-element list returned
>>>>>> by reduce(x[[i]], with.mapping=TRUE)
>>>>>>
>>>>>> (2) or a 2-element list where 1 element is the reduced
>>>>>>GRangesList
>>>>>> and the other element is an IntegerList representing the
>>>>>> list of mappings?
>>>>>>
>>>>>> (1) would be very inefficient because the returned object would need
>>>>>> to be populated with hundreds of thousands of S4 instances.
>>>>>>
>>>>>> (2) disrupts too much how reduce() is expected to behave on a
>>>>>> GRangesList object i.e. it's expected to operate in a "vectorized"
>>>>>> fashion, that is, each top-level element in the input is reduced
>>>>>> independently of the others and all the results are stored in a
>>>>>> list-like object of the *same length* as the input. So we have nice
>>>>>> properties like
>>>>>>
>>>>>> reduce(x, ...)[[i]] is identical to reduce(x[[i]], ...)
>>>>>>
>>>>>> Here that would not be the case anymore :-/
>>>>>>
>>>>>> More generally speaking, I would not give up on the "endomorphism"
>>>>>> nature of reduce() so easily. It gives us good things like for
>>>>>> example its behaviour on a GRangesList object can be explained
>>>>>> as easily as with
>>>>>>
>>>>>> endoapply(x, reduce, ....)
>>>>>>
>>>>>> *whatever* arguments/parameters/toggles are passed to it. This
>>>>>> makes the documentation *much* easier to write and also it makes
>>>>>> writing unit test much easier.
>>>>>>
>>>>>> So if we really want to go for the list solution, I would suggest
>>>>>> that this is done outside reduce() e.g. in reduceAndMap() or
>>>>>> something like that.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> H.
>>>>>>
>>>>>>
>>>>>>> Kasper
>>>>>>>
>>>>>>> 2012/3/15 Hervé Pagès<hpages at fhcrc.org>:
>>>>>>>
>>>>>>>> On 03/15/2012 12:45 PM, Cook, Malcolm wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Herve,
>>>>>>>>>
>>>>>>>>> I've not used attributes to return values before.
>>>>>>>>>
>>>>>>>>> I guess it would work, and I won't object further if you do it
>>>>>>>>>this
>>>>>>>>> way,
>>>>>>>>> but, since you asked
>>>>>>>>>
>>>>>>>>> Again, it "feels wrong" in violating functional
>>>>>>>>>
>>>>>>>>> I suspect there may be issues with memory management. When does
>>>>>>>>> the
>>>>>>>>> attribute get gc-ed? When the object does? If so, then,
>>>>>>>>>retaining
>>>>>>>>> the
>>>>>>>>> attribute in memory when not needed _could_ be a burden, no?
>>>>>>>>>
>>>>>>>>> Back in my lisp days, this is when I would use `values` and
>>>>>>>>> `multiple-value-bind` (and friends) when I wanted a function to
>>>>>>>>> (optionally)
>>>>>>>>> return multiple values.
>>>>>>>>>
>>>>>>>>> But this is R.
>>>>>>>>>
>>>>>>>>> Would you consider returning instead a list of values, keyed by
>>>>>>>>> `value`
>>>>>>>>> and `hits`, but only when with.hits
>>>>>>>>>
>>>>>>>>> BTW: with.inframe.attrib is documented as 'For internal use'.
>>>>>>>>>What
>>>>>>>>> does
>>>>>>>>> it return in the attr?
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> AFAIK, it's only supported by the "reduce" methods for IRanges
>>>>>>>> objects.
>>>>>>>>
>>>>>>>> The "inframe" attribute contains an IRanges object of the same
>>>>>>>> length
>>>>>>>> as
>>>>>>>> the input. For each range in the input it tells you the position
>>>>>>>>of
>>>>>>>> that range with respect to the "frame" i.e. the space obtained by
>>>>>>>> pasting together the ranges in the reduce object:
>>>>>>>>
>>>>>>>>
>>>>>>>> > ir
>>>>>>>> IRanges of length 5
>>>>>>>> start end width
>>>>>>>> [1] 24 28 5
>>>>>>>> [2] 27 31 5
>>>>>>>> [3] 1 5 5
>>>>>>>> [4] 6 10 5
>>>>>>>> [5] 12 16 5
>>>>>>>>
>>>>>>>> > ir2<- reduce(ir, with.inframe.attrib=TRUE)
>>>>>>>> > ir2
>>>>>>>> IRanges of length 3
>>>>>>>> start end width
>>>>>>>> [1] 1 10 10
>>>>>>>> [2] 12 16 5
>>>>>>>> [3] 24 31 8
>>>>>>>> > attr(ir2, "inframe")
>>>>>>>> IRanges of length 5
>>>>>>>> start end width
>>>>>>>> [1] 16 20 5
>>>>>>>> [2] 19 23 5
>>>>>>>> [3] 1 5 5
>>>>>>>> [4] 6 10 5
>>>>>>>> [5] 11 15 5
>>>>>>>>
>>>>>>>>
>>>>>>>> 1 1 2 2 3
>>>>>>>> 1...5....0....5....0....5....**0.<- standard coordinate
>>>>>>>> system
>>>>>>>> ir[1] xxxxx
>>>>>>>> ir[2] xxxxx
>>>>>>>> ir[3] xxxxx
>>>>>>>> ir[4] xxxxx
>>>>>>>> ir[5] xxxxx
>>>>>>>>
>>>>>>>> ir2: xxxxxxxxxx xxxxx xxxxxxxx
>>>>>>>>
>>>>>>>> 1...5....1 ....1 ....2...<- "frame" coordinate
>>>>>>>>system
>>>>>>>> 0 5 0
>>>>>>>>
>>>>>>>> I'll document this.
>>>>>>>>
>>>>>>>> H.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Thanks for listening!
>>>>>>>>>
>>>>>>>>> ~Malcolm
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>>> From:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>bioc-devel-bounces at r-project.**org<bioc-devel-bounces at r-project.o
>>>>>>>>>>rg
>>>>>>>>>>> [
>>>>>>>>>> mailto:
>>>>>>>>>> bioc-devel-bounces at r-
>>>>>>>>>> project.org] On Behalf Of Hervé Pagès
>>>>>>>>>> Sent: Thursday, March 15, 2012 1:55 PM
>>>>>>>>>> To: Kasper Daniel Hansen
>>>>>>>>>> Cc: bioc-devel at r-project.org
>>>>>>>>>> Subject: Re: [Bioc-devel] mapping between original and reduced
>>>>>>>>>> ranges
>>>>>>>>>>
>>>>>>>>>> Hi reducers,
>>>>>>>>>>
>>>>>>>>>> I agree it "feels wrong" to use findOverlaps() to extract the
>>>>>>>>>> mapping
>>>>>>>>>> from original to reduced ranges. Even if it can be computed very
>>>>>>>>>> easily
>>>>>>>>>> with:
>>>>>>>>>>
>>>>>>>>>> findOverlaps(gr, reduce(gr), select="first")
>>>>>>>>>>
>>>>>>>>>> (Note that using 'queryHits(findOverlaps(**reduce(gr), gr))'
>>>>>>>>>>only
>>>>>>>>>> produces
>>>>>>>>>> the correct result if 'gr' is already sorted by increasing
>>>>>>>>>>order.)
>>>>>>>>>>
>>>>>>>>>> I think it would be easy for reduce() internal code to produce
>>>>>>>>>> this
>>>>>>>>>> mapping. The question is: how do we give it back to the user?
>>>>>>>>>>
>>>>>>>>>> Is it OK to use an attribute for this? reduce() already uses
>>>>>>>>>>this
>>>>>>>>>> for returning some extra information about the reduction:
>>>>>>>>>>
>>>>>>>>>> > ir
>>>>>>>>>> IRanges of length 5
>>>>>>>>>> start end width
>>>>>>>>>> [1] 1 5 5
>>>>>>>>>> [2] 6 10 5
>>>>>>>>>> [3] 12 16 5
>>>>>>>>>> [4] 24 28 5
>>>>>>>>>> [5] 27 31 5
>>>>>>>>>> > ir2<- reduce(ir, with.inframe.attrib=TRUE)
>>>>>>>>>> > ir2
>>>>>>>>>> IRanges of length 3
>>>>>>>>>> start end width
>>>>>>>>>> [1] 1 10 10
>>>>>>>>>> [2] 12 16 5
>>>>>>>>>> [3] 24 31 8
>>>>>>>>>> > attr(ir2, "inframe")
>>>>>>>>>> IRanges of length 5
>>>>>>>>>> start end width
>>>>>>>>>> [1] 1 5 5
>>>>>>>>>> [2] 6 10 5
>>>>>>>>>> [3] 11 15 5
>>>>>>>>>> [4] 16 20 5
>>>>>>>>>> [5] 19 23 5
>>>>>>>>>>
>>>>>>>>>> We could to the same thing for the mapping from original to
>>>>>>>>>> reduced
>>>>>>>>>> ranges with e.g. an argument called 'with.mapping.attrib'.
>>>>>>>>>> Would that work?
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> H.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 03/15/2012 05:44 AM, Kasper Daniel Hansen wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> So the key question is to what extent keeping track of where
>>>>>>>>>>>the
>>>>>>>>>>> ranges comes from would slow down the reduce operation. I am
>>>>>>>>>>>not
>>>>>>>>>>> familiar enough with the algorithm to know this, but given how
>>>>>>>>>>> fast
>>>>>>>>>>> IRanges is in general, I am not one for guessing on this.
>>>>>>>>>>>
>>>>>>>>>>> I agree with Florian that this is a very typical use case.
>>>>>>>>>>>
>>>>>>>>>>> Kasper
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Mar 15, 2012 at 5:02 AM, Hahne, Florian
>>>>>>>>>>> <florian.hahne at novartis.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>> It is true that this is not terribly slow when you deal with
>>>>>>>>>>>> fairly
>>>>>>>>>>>> large
>>>>>>>>>>>> range objects:
>>>>>>>>>>>>
>>>>>>>>>>>> foo<- GRanges(seqnames=sample(1:4, 1e6, TRUE),
>>>>>>>>>>>> ranges=IRanges(start=as.**integer(runif(min=1, max=1e7,
>>>>>>>>>>>>n=1e6)),
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> width=50))
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> system.time(bar<- reduce(foo))
>>>>>>>>>>>> user system elapsed
>>>>>>>>>>>> 0.918 0.174 1.091
>>>>>>>>>>>>
>>>>>>>>>>>> system.time(foobar<- findOverlaps(foo, bar))
>>>>>>>>>>>> user system elapsed
>>>>>>>>>>>> 2.051 0.402 2.453
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> However the whole process does take about 3x the time of just
>>>>>>>>>>>> the
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> reduce
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> operation, and in my use case I want this to happen
>>>>>>>>>>>> interactively,
>>>>>>>>>>>> where
>>>>>>>>>>>> waiting 3 seconds compared to 1 makes a huge difference...
>>>>>>>>>>>>
>>>>>>>>>>>> I wouldn't push this high up on the development agenda, but it
>>>>>>>>>>>> seems
>>>>>>>>>>>> to
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> be
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> something that is already 95% existing and could easily be
>>>>>>>>>>>> added.
>>>>>>>>>>>> But
>>>>>>>>>>>> maybe I am wrong...
>>>>>>>>>>>>
>>>>>>>>>>>> Florian
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Florian Hahne
>>>>>>>>>>>> Novartis Institute For Biomedical Research
>>>>>>>>>>>> Translational Sciences / Preclinical Safety / PCS Informatics
>>>>>>>>>>>> Expert Data Integration and Modeling Bioinformatics
>>>>>>>>>>>> CHBS, WKL-135.2.26
>>>>>>>>>>>> Novartis Institute For Biomedical Research, Werk Klybeck
>>>>>>>>>>>> Klybeckstrasse 141
>>>>>>>>>>>> CH-4057 Basel
>>>>>>>>>>>> Switzerland
>>>>>>>>>>>> Phone: +41 61 6967127
>>>>>>>>>>>> Email : florian.hahne at novartis.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 3/14/12 9:40 PM, "Kasper Daniel
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> Hansen"<kasperdanielhansen@**gmail.com
>>>>>>>>>> <kasperdanielhansen at gmail.com>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> We have discussed this a couple of times. I routinely uses
>>>>>>>>>>>> the
>>>>>>>>>>>>> reduce
>>>>>>>>>>>>> followed by findOverlaps paradigm. As Malcolm says it feels
>>>>>>>>>>>>> wrong,
>>>>>>>>>>>>> but from a practical point of view it is pretty fast, so I
>>>>>>>>>>>>> stopped
>>>>>>>>>>>>> worrying about it. I only think there is a reason to do
>>>>>>>>>>>>>this,
>>>>>>>>>>>>> if it
>>>>>>>>>>>>> is substantially faster.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Kasper
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Mar 14, 2012 at 3:46 PM, Cook,
>>>>>>>>>>>>>Malcolm<MEC at stowers.org>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>> Chiming in....
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> on a similar note....
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> A version of `disjoin` which returns a Hits/RangesMapping
>>>>>>>>>>>>>> additional
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> the GRanges result would be most useful and probably not
>>>>>>>>>>>>>> require
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>> much
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>> additional effort (assuming `disjoin` computes this
>>>>>>>>>>>>>> internally)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Of course, it is easy to live without since I can just
>>>>>>>>>>>>>>perform
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> findOverlaps myself after the disjoin.... it just "feels
>>>>>>>>>>>>>> wrong"
>>>>>>>>>>>>>> (tm)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ahoy!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ~Malcolm
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>> From:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>bioc-devel-bounces at r-project.**org<bioc-devel-bounces at r-proj
>>>>>>>>>>>>>>>ec
>>>>>>>>>>>>>>> t.
>>>>>>>>>>>>>>> org>[mailto:
>>>>>>>>>>>>>>> bioc-devel-
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>> bounces at r-
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>> project.org] On Behalf Of Hahne, Florian
>>>>>>>>>>>>>>> Sent: Wednesday, March 14, 2012 2:22 PM
>>>>>>>>>>>>>>> To: bioc-devel at r-project.org
>>>>>>>>>>>>>>> Subject: [Bioc-devel] mapping between original and reduced
>>>>>>>>>>>>>>> ranges
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This bounced before, guess the mailing list does not like
>>>>>>>>>>>>>>> HTML
>>>>>>>>>>>>>>> mails.
>>>>>>>>>>>>>>> So
>>>>>>>>>>>>>>> one more try:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I had the following offline discussion with Michael about
>>>>>>>>>>>>>>>how
>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>> could
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>> retain a mapping of the ranges in a GRanges object before
>>>>>>>>>>>>>>>and
>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>> reduce. He suggested to take it to the list. Is that
>>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>> be added to GenomicRanges/IRanges?
>>>>>>>>>>>>>>> Florian
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have a slightly tricky application for which I need to
>>>>>>>>>>>>>>> reduce a
>>>>>>>>>>>>>>> GRanges
>>>>>>>>>>>>>>> object, but I would like to be able to process some of the
>>>>>>>>>>>>>>> original
>>>>>>>>>>>>>>> elementMetadata of the merged ranges later. The only way I
>>>>>>>>>>>>>>> was
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>> able to
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>> figure out which of the original ranges correspond to the
>>>>>>>>>>>>>>> merged
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>> ranges
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>> was to perform a findOverlaps operation, but of course that
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>> rather
>>>>>>>>>>>>>>> costly. Is there a way to get the merge information out of
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> original
>>>>>>>>>>>>>>> reduce call?
>>>>>>>>>>>>>>> Here is a brief example:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> gr<- GRanges(seqnames="chr1",
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>> ranges=IRanges(start=c(1,6,12,**24,27),
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>> width=5), foo=1:5, bar=letters[1:5])
>>>>>>>>>>>>>>> gr2<- reduce(gr, min.gapwidth=1)
>>>>>>>>>>>>>>> ind<- queryHits(findOverlaps(gr2, gr))
>>>>>>>>>>>>>>> split(values(gr), ind)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Unfortunately, this is the idiom. I could see an
>>>>>>>>>>>>>>>improvement
>>>>>>>>>>>>>>> where
>>>>>>>>>>>>>>> reduce
>>>>>>>>>>>>>>> or a similarly named function would return a Hits object
>>>>>>>>>>>>>>>(in
>>>>>>>>>>>>>>> addition
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> the actual reduce result) that would indicate the mapping
>>>>>>>>>>>>>>> between
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>> the
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>> input and reduced ranges. The RangesMapping structure would
>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>> really
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>> close to what we would need.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Michael
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ______________________________**_________________
>>>>>>>>>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://s
>>>>>>>>>>>>>>>ta
>>>>>>>>>>>>>>> t.
>>>>>>>>>>>>>>> ethz.ch/mailman/listinfo/bioc-devel>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ______________________________**_________________
>>>>>>>>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://st
>>>>>>>>>>>>>>at
>>>>>>>>>>>>>> .e
>>>>>>>>>>>>>> thz.ch/mailman/listinfo/bioc-devel>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> ______________________________**_________________
>>>>>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.
>>>>>>>>>>>et
>>>>>>>>>>> hz
>>>>>>>>>>> .ch/mailman/listinfo/bioc-devel>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Hervé Pagès
>>>>>>>>>>
>>>>>>>>>> Program in Computational Biology
>>>>>>>>>> Division of Public Health Sciences
>>>>>>>>>> Fred Hutchinson Cancer Research Center
>>>>>>>>>> 1100 Fairview Ave. N, M1-B514
>>>>>>>>>> P.O. Box 19024
>>>>>>>>>> Seattle, WA 98109-1024
>>>>>>>>>>
>>>>>>>>>> E-mail: hpages at fhcrc.org
>>>>>>>>>> Phone: (206) 667-5791
>>>>>>>>>> Fax: (206) 667-1319
>>>>>>>>>>
>>>>>>>>>> ______________________________**_________________
>>>>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.e
>>>>>>>>>>th
>>>>>>>>>> z.
>>>>>>>>>> ch/mailman/listinfo/bioc-devel>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Hervé Pagès
>>>>>>>>
>>>>>>>> Program in Computational Biology
>>>>>>>> Division of Public Health Sciences
>>>>>>>> Fred Hutchinson Cancer Research Center
>>>>>>>> 1100 Fairview Ave. N, M1-B514
>>>>>>>> P.O. Box 19024
>>>>>>>> Seattle, WA 98109-1024
>>>>>>>>
>>>>>>>> E-mail: hpages at fhcrc.org
>>>>>>>> Phone: (206) 667-5791
>>>>>>>> Fax: (206) 667-1319
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Hervé Pagès
>>>>>>
>>>>>> Program in Computational Biology
>>>>>> Division of Public Health Sciences
>>>>>> Fred Hutchinson Cancer Research Center
>>>>>> 1100 Fairview Ave. N, M1-B514
>>>>>> P.O. Box 19024
>>>>>> Seattle, WA 98109-1024
>>>>>>
>>>>>> E-mail: hpages at fhcrc.org
>>>>>> Phone: (206) 667-5791
>>>>>> Fax: (206) 667-1319
>>>>>>
>>>>>> ______________________________**_________________
>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>
>>>>>>
>>>>>>
>>>>>>https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.
>>>>>>ch
>>>>>> /m
>>>>>> ailman/listinfo/bioc-devel>
>>>>>>
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>
>>>
>>> --
>>> Hervé Pagès
>>>
>>> Program in Computational Biology
>>> Division of Public Health Sciences
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N, M1-B514
>>> P.O. Box 19024
>>> Seattle, WA 98109-1024
>>>
>>> E-mail: hpages at fhcrc.org
>>> Phone: (206) 667-5791
>>> Fax: (206) 667-1319
>>
>
>
>--
>Hervé Pagès
>
>Program in Computational Biology
>Division of Public Health Sciences
>Fred Hutchinson Cancer Research Center
>1100 Fairview Ave. N, M1-B514
>P.O. Box 19024
>Seattle, WA 98109-1024
>
>E-mail: hpages at fhcrc.org
>Phone: (206) 667-5791
>Fax: (206) 667-1319
More information about the Bioc-devel
mailing list