[Bioc-devel] Res: Bioc-devel Digest, Vol 96, Issue 2Mc0

Sat Mar 17 20:54:05 CET 2012

C.                                                      Ck km
Enviado pelo meu aparelho BlackBerry® da Vivo

-----Original Message-----
From: bioc-devel-request at r-project.org
Sender: bioc-devel-bounces at r-project.org
Date: Fri, 16 Mar 2012 23:20:20 
To: <bioc-devel at r-project.org>
Reply-To: bioc-devel at r-project.org
Subject: Bioc-devel Digest, Vol 96, Issue 20

Send Bioc-devel mailing list submissions to
	bioc-devel at r-project.org

To subscribe or unsubscribe via the World Wide Web, visit
	https://stat.ethz.ch/mailman/listinfo/bioc-devel
or, via email, send a message with subject or body 'help' to
	bioc-devel-request at r-project.org

You can reach the person managing the list at
	bioc-devel-owner at r-project.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Bioc-devel digest..."

Today's Topics:

   1. Re: Alert to developers about changes to edgeR (Nicolas Delhomme)
   2. Re: mapping between original and reduced ranges (Herv? Pag?s)

----------------------------------------------------------------------

Message: 1
Date: Fri, 16 Mar 2012 16:40:57 +0100
From: Nicolas Delhomme <delhomme at embl.de>
To: Gordon K Smyth <smyth at wehi.EDU.AU>
Cc: bioc-devel at r-project.org
Subject: Re: [Bioc-devel] Alert to developers about changes to edgeR
Message-ID: <8411B109-60EC-46F0-A40D-CB584E56E4A1 at embl.de>
Content-Type: text/plain; charset=us-ascii

Dear Gordon,

Thanks for letting us know!

Best,

---------------------------------------------------------------
Nicolas Delhomme

Genome Biology Computational Support

European Molecular Biology Laboratory

Tel: +49 6221 387 8310
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---------------------------------------------------------------

On 16 Mar 2012, at 03:40, Gordon K Smyth wrote:

> The deadline for the 2.10 Bioconductor release is very close.  There are a few packages that depend on or suggest edgeR.  This email is to alert developers to changes in edgeR that might potentially break code.  One of the changes below was made only two weeks ago, so I thought it would be worth an email to alert people.
> 
> 1. estimateCommonDisp() and estimateGLMCommonDisp() now return dispersion=NA when there is no replication.  This is to force users to make a decision for themselves about what action to take in this case. This means that any subsequent function that uses the common dispersion will fail when there is no replication.  If you want to reproduce earlier behaviour, you need to explicitly enter dispersion=0 to subsequent functions.
> 
> 2. The column headings in the table returned by exactTest() and glmLRT() have changed.  For example, the column called logConc is now logCPM (log counts-per-million).  If anyone is pulling out columns such as logConc by name, such code might be broken.
> 
> There are lots of other changes since the last release, but the above two are fairly recent and the most likely to break dependent code.
> 
> Best wishes
> Gordon
> 
> ---------------------------------------------
> Professor Gordon K Smyth,
> Bioinformatics Division,
> Walter and Eliza Hall Institute of Medical Research,
> 1G Royal Parade, Parkville, Vic 3052, Australia.
> smyth at wehi.edu.au
> http://www.wehi.edu.au
> http://www.statsci.org/smyth
> 
> ______________________________________________________________________
> The information in this email is confidential and intend...{{dropped:4}}
> 
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

------------------------------

Message: 2
Date: Fri, 16 Mar 2012 15:20:06 -0700
From: Herv? Pag?s <hpages at fhcrc.org>
To: "Hahne, Florian" <florian.hahne at novartis.com>
Cc: Michael Lawrence <lawrence.michael at gene.com>,
	"bioc-devel at r-project.org" <bioc-devel at r-project.org>
Subject: Re: [Bioc-devel] mapping between original and reduced ranges
Message-ID: <4F63BC96.6080205 at fhcrc.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi Florian,

On 03/16/2012 01:02 AM, Hahne, Florian wrote:
> Hi Herve, the two are related but not similar issues:
>
>
>
> On 3/16/12 6:07 AM, "Herv? Pag?s"<hpages at fhcrc.org>  wrote:
>
>> On 03/15/2012 04:13 PM, Hahne, Florian wrote:
>>> Would such a solution also allow to keep the original elementMetadata in
>>> the respective list representation?
>>
>> Let see if I understand your question correctly.
>>
>> IIUC Michael is proposing to store the "reverse mapping" (i.e. the
>> mapping from the reduced ranges to the original ranges) in the
>> elementMetadata slot of the reduced object. Note that, unlike the
>> "direct mapping" (i.e. the mapping from the original ranges to the
>> reduced ranges), which is a many-to-1 relationship, the "reverse
>> mapping is 1-to-many, hence the need to use an IntegerList to represent
>> it. The good thing about this IntegerList is that it has the length of
>> the reduced object so it fits naturally in its elementMetadata slot:
>>
>>    >  ir
>>    IRanges of length 5
>>        start end width
>>    [1]    24  28     5
>>    [2]    27  31     5
>>    [3]     1   5     5
>>    [4]     6  10     5
>>    [5]    12  16     5
>>    >  ir2<- reduce(ir)
>>    >  ir2
>>    IRanges of length 3
>>        start end width
>>    [1]     1  10    10
>>    [2]    12  16     5
>>    [3]    24  31     8
>>    >  hits<- findOverlaps(ir, ir2)
>>    >  rmap<- IntegerList(split(queryHits(hits), subjectHits(hits)))
>>    >  rmap
>>    CompressedIntegerList of length 3
>>    [["1"]] 3 4
>>    [["2"]] 5
>>    [["3"]] 1 2
>>
>> The "direct mapping" is a simpler object (just an integer vector) but it
>> has the length of the original object so it cannot be stored in the
>> elementMetadata of the reduced object. It has to go somewhere else
>> (attribute or metadata). A good thing about storing the "reverse
>> mapping" in the elementMetadata slot is that there is no risk of
>> "getting in the way" (i.e. clash with other stuff in that slot)
>> because reduce() drops the original elementMetadata anyway. It also
>> plays nice with subsetting the reduced object and with the vectorized
>> behaviour of the "reduce" method for GRangesList.
>>
>> The only small con I see is that, like in the use case you show in your
>> original post, the user might actually need the direct mapping, not the
>> reverse one. However it's not too hard to reverse a mapping. There is
>> actually a function in Biobase for doing this on a regular list:
>>
>>    >  as.integer(reverseSplit(as.list(rmap)))
>>    [1] 3 3 1 1 2
>>    >  findOverlaps(ir, reduce(ir), select="first")
>>    [1] 3 3 1 1 2
>>
>> There is also revmap() in AnnotationDbi that does the same on Bimap
>> objects. It also works on a regular list:
>>
>>    >  as.integer(revmap(as.list(rmap)))
>>    [1] 3 3 1 1 2
>>
>> We could move the revmap() generic from AnnotationDbi to IRanges (or
>> to BiocGenerics) and add a method for IntegerList objects.
>>
>> I think this is actually my preferred solution so far.
>
> This sounds perfect to me and absolutely handles the mapping problem

Great! Seems like we have an agreement :-) Now I find that putting this
"reverse mapping" in the elementMetadata of the returned object is the
right thing to do. The same can be done with disjoin() where the
reverse mapping is many-to-many. I don't even see the need for an
extra argument to reduce() or disjoin() anymore in order to let the
user choose whether s/he wants the reverse mapping or not. The overhead
should be minimal (the reverse mapping is light and easy to compute).

If nobody objects, I'll put this on my list of things to do for BioC
2.11.

>
>>
>> Back to your question:
>>
>>    Would such a solution also allow to keep the original elementMetadata
>>    in the respective list representation?
>>
>> Something like "folding" the original elementMetadata in a way that
>> makes it fit in the elementMetadata of the reduced object? The original
>> elementMetadata (a DataFrame) could actually be split in a
>> SplitDataFrameList (i.e. conceptually a list of DataFrame's that all
>> have the same columns), and this SplitDataFrameList stored in the
>> elementMetadata of the reduced object. It would be stored as a single
>> column though since elementMetadata must be DataFrame, it cannot
>> be a SplitDataFrameList. That solution feels "heavy" to me i.e. it
>> involves complex data structures that are not so easy to manipulate.
>> I still like Michael's proposal better, it's much lighter.
>
> The reason I was asking this is that with the mapping I would actually
> split the old elementMetadata and do some processing on it. So in addition
> to being able to add the mapping I thought it would be kind of useful to
> retain the original metadata in a "folded" structure similar to the
> IntegerList approach Michael is proposing. Essentially one would collapse
> each of the existing elementMetadata columns in a list representation. At
> first I found it a bit surprising that all my elementMetadata was gone
> after reduce, but after some thinking it actually makes sense, since the
> original ranges are gone, too, and having a reduced/folded elementMetadata
> slot in the default output would be very confusing. I was just thinking to
> have this as an option, mainly because I can see this as a general use
> case, and It would make subsequent operations a little more coherent
> compared to having a folded elementMetadata representation and the reduced
> ranges in two separate objects.

Note that the user will be able to easily do this folding with:

   gr2 <- reduce(gr)
   map <- unlist(revmap(elementMetadata(gr2)$rmap))  # reverse
   split(elementMetadata(gr), map)  # returns a SplitDataFrameList

The man page for reduce() will show how to do this. I'm a little bit
hesitant though to add this as an option to reduce() because of what
I said earlier i.e. the SplitDataFrameList would then need to be stored
in elementMetadata(gr2) as a *single* column with something like:

   elementMetadata(gr2)$foldedOriginalMetadata

... which is a little weird, and only experts would actually know how
to access specific data in this folded thing. But if people want this,
no problem, I'll add it to my list too.

Cheers,
H.

>
>>
>> Cheers,
>> H.
>>
>>> I assume that creates about the same
>>> overhead as keeping the index?
>>>
>>>
>>> Florian Hahne
>>> Novartis Institute For Biomedical Research
>>> Translational Sciences / Preclinical Safety / PCS Informatics
>>> Expert Data Integration and Modeling Bioinformatics
>>> CHBS, WKL-135.2.26
>>> Novartis Institute For Biomedical Research, Werk Klybeck
>>> Klybeckstrasse 141
>>> CH-4057 Basel
>>> Switzerland
>>> Phone: +41 61 6967127
>>> Email : florian.hahne at novartis.com
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 3/15/12 11:58 PM, "Michael Lawrence"<lawrence.michael at gene.com>
>>> wrote:
>>>
>>>> I would be in favor of either the attribute or metadata solution. I
>>>> could
>>>> see having an IntegerList element in the element metadata that
>>>> indicates
>>>> the original ranges that were reduced into the returned range, or a
>>>> Hits
>>>> object in the top-level metadata. A plus and minus of the metadata
>>>> approach
>>>> is that it is more familiar to the users than hiding stuff in
>>>> attributes,
>>>> which is pretty low-level. However, using the metadata will increase
>>>> the
>>>> probabilty of "getting in the way". The user does need to explicitly
>>>> request it though.
>>>>
>>>> Michael
>>>>
>>>> On Thu, Mar 15, 2012 at 3:26 PM, Herv? Pag?s<hpages at fhcrc.org>   wrote:
>>>>
>>>>> On 03/15/2012 02:40 PM, Kasper Daniel Hansen wrote:
>>>>>
>>>>>> I'll vote against the attribute solution and for a solution where the
>>>>>> type of return object gets changed, for example into a list.
>>>>>>
>>>>>
>>>>> Thanks for voting!
>>>>>
>>>>> Problem with this is how you handle 'with.mapping=TRUE' when the input
>>>>> is GRangesList. Do you return
>>>>>
>>>>>    (1) a list of the same length as the input 'x', where the i-th
>>>>>        top-level element is itself the 2-element list returned
>>>>>        by reduce(x[[i]], with.mapping=TRUE)
>>>>>
>>>>>    (2) or a 2-element list where 1 element is the reduced GRangesList
>>>>>        and the other element is an IntegerList representing the
>>>>>        list of mappings?
>>>>>
>>>>> (1) would be very inefficient because the returned object would need
>>>>> to be populated with hundreds of thousands of S4 instances.
>>>>>
>>>>> (2) disrupts too much how reduce() is expected to behave on a
>>>>> GRangesList object i.e. it's expected to operate in a "vectorized"
>>>>> fashion, that is, each top-level element in the input is reduced
>>>>> independently of the others and all the results are stored in a
>>>>> list-like object of the *same length* as the input. So we have nice
>>>>> properties like
>>>>>
>>>>>      reduce(x, ...)[[i]] is identical to reduce(x[[i]], ...)
>>>>>
>>>>> Here that would not be the case anymore :-/
>>>>>
>>>>> More generally speaking, I would not give up on the "endomorphism"
>>>>> nature of reduce() so easily. It gives us good things like for
>>>>> example its behaviour on a GRangesList object can be explained
>>>>> as easily as with
>>>>>
>>>>>      endoapply(x, reduce, ....)
>>>>>
>>>>> *whatever* arguments/parameters/toggles are passed to it. This
>>>>> makes the documentation *much* easier to write and also it makes
>>>>> writing unit test much easier.
>>>>>
>>>>> So if we really want to go for the list solution, I would suggest
>>>>> that this is done outside reduce() e.g. in reduceAndMap() or
>>>>> something like that.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> H.
>>>>>
>>>>>
>>>>>> Kasper
>>>>>>
>>>>>> 2012/3/15 Herv? Pag?s<hpages at fhcrc.org>:
>>>>>>
>>>>>>> On 03/15/2012 12:45 PM, Cook, Malcolm wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Hi Herve,
>>>>>>>>
>>>>>>>> I've not used attributes to return values before.
>>>>>>>>
>>>>>>>> I guess it would work, and I won't object further if you do it this
>>>>>>>> way,
>>>>>>>> but, since you asked
>>>>>>>>
>>>>>>>> Again, it "feels wrong" in violating functional
>>>>>>>>
>>>>>>>> I suspect there may be issues with memory management.  When does
>>>>>>>> the
>>>>>>>> attribute get gc-ed?  When the object does?  If so, then, retaining
>>>>>>>> the
>>>>>>>> attribute in memory when not needed _could_ be a burden, no?
>>>>>>>>
>>>>>>>> Back in my lisp days, this is when I would use `values` and
>>>>>>>> `multiple-value-bind` (and friends) when I wanted a function to
>>>>>>>> (optionally)
>>>>>>>> return multiple values.
>>>>>>>>
>>>>>>>> But this is R.
>>>>>>>>
>>>>>>>> Would you consider returning instead a list of values, keyed by
>>>>>>>> `value`
>>>>>>>> and `hits`, but only when with.hits
>>>>>>>>
>>>>>>>> BTW: with.inframe.attrib is documented as 'For internal use'.  What
>>>>>>>> does
>>>>>>>> it return in the attr?
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> AFAIK, it's only supported by the "reduce" methods for IRanges
>>>>>>> objects.
>>>>>>>
>>>>>>> The "inframe" attribute contains an IRanges object of the same
>>>>>>> length
>>>>>>> as
>>>>>>> the input. For each range in the input it tells you the position of
>>>>>>> that range with respect to the "frame" i.e. the space obtained by
>>>>>>> pasting together the ranges in the reduce object:
>>>>>>>
>>>>>>>
>>>>>>>    >    ir
>>>>>>>    IRanges of length 5
>>>>>>>        start end width
>>>>>>>    [1]    24  28     5
>>>>>>>    [2]    27  31     5
>>>>>>>    [3]     1   5     5
>>>>>>>    [4]     6  10     5
>>>>>>>    [5]    12  16     5
>>>>>>>
>>>>>>>    >    ir2<- reduce(ir, with.inframe.attrib=TRUE)
>>>>>>>    >    ir2
>>>>>>>    IRanges of length 3
>>>>>>>        start end width
>>>>>>>    [1]     1  10    10
>>>>>>>    [2]    12  16     5
>>>>>>>    [3]    24  31     8
>>>>>>>    >    attr(ir2, "inframe")
>>>>>>>    IRanges of length 5
>>>>>>>        start end width
>>>>>>>    [1]    16  20     5
>>>>>>>    [2]    19  23     5
>>>>>>>    [3]     1   5     5
>>>>>>>    [4]     6  10     5
>>>>>>>    [5]    11  15     5
>>>>>>>
>>>>>>>
>>>>>>>                    1    1    2    2    3
>>>>>>>           1...5....0....5....0....5....**0.<- standard coordinate
>>>>>>> system
>>>>>>>    ir[1]                         xxxxx
>>>>>>>    ir[2]                            xxxxx
>>>>>>>    ir[3]  xxxxx
>>>>>>>    ir[4]       xxxxx
>>>>>>>    ir[5]             xxxxx
>>>>>>>
>>>>>>>    ir2:   xxxxxxxxxx xxxxx       xxxxxxxx
>>>>>>>
>>>>>>>           1...5....1 ....1       ....2...<- "frame" coordinate system
>>>>>>>                    0     5           0
>>>>>>>
>>>>>>> I'll document this.
>>>>>>>
>>>>>>> H.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Thanks for listening!
>>>>>>>>
>>>>>>>> ~Malcolm
>>>>>>>>
>>>>>>>>
>>>>>>>>    -----Original Message-----
>>>>>>>>> From:
>>>>>>>>>
>>>>>>>>> bioc-devel-bounces at r-project.**org<bioc-devel-bounces at r-project.org
>>>>>>>>>> [
>>>>>>>>> mailto:
>>>>>>>>> bioc-devel-bounces at r-
>>>>>>>>> project.org] On Behalf Of Herv? Pag?s
>>>>>>>>> Sent: Thursday, March 15, 2012 1:55 PM
>>>>>>>>> To: Kasper Daniel Hansen
>>>>>>>>> Cc: bioc-devel at r-project.org
>>>>>>>>> Subject: Re: [Bioc-devel] mapping between original and reduced
>>>>>>>>> ranges
>>>>>>>>>
>>>>>>>>> Hi reducers,
>>>>>>>>>
>>>>>>>>> I agree it "feels wrong" to use findOverlaps() to extract the
>>>>>>>>> mapping
>>>>>>>>> from original to reduced ranges. Even if it can be computed very
>>>>>>>>> easily
>>>>>>>>> with:
>>>>>>>>>
>>>>>>>>>      findOverlaps(gr, reduce(gr), select="first")
>>>>>>>>>
>>>>>>>>> (Note that using 'queryHits(findOverlaps(**reduce(gr), gr))' only
>>>>>>>>> produces
>>>>>>>>> the correct result if 'gr' is already sorted by increasing order.)
>>>>>>>>>
>>>>>>>>> I think it would be easy for reduce() internal code to produce
>>>>>>>>> this
>>>>>>>>> mapping. The question is: how do we give it back to the user?
>>>>>>>>>
>>>>>>>>> Is it OK to use an attribute for this? reduce() already uses this
>>>>>>>>> for returning some extra information about the reduction:
>>>>>>>>>
>>>>>>>>>      >      ir
>>>>>>>>>      IRanges of length 5
>>>>>>>>>          start end width
>>>>>>>>>      [1]     1   5     5
>>>>>>>>>      [2]     6  10     5
>>>>>>>>>      [3]    12  16     5
>>>>>>>>>      [4]    24  28     5
>>>>>>>>>      [5]    27  31     5
>>>>>>>>>      >      ir2<- reduce(ir, with.inframe.attrib=TRUE)
>>>>>>>>>      >      ir2
>>>>>>>>>      IRanges of length 3
>>>>>>>>>          start end width
>>>>>>>>>      [1]     1  10    10
>>>>>>>>>      [2]    12  16     5
>>>>>>>>>      [3]    24  31     8
>>>>>>>>>      >      attr(ir2, "inframe")
>>>>>>>>>      IRanges of length 5
>>>>>>>>>          start end width
>>>>>>>>>      [1]     1   5     5
>>>>>>>>>      [2]     6  10     5
>>>>>>>>>      [3]    11  15     5
>>>>>>>>>      [4]    16  20     5
>>>>>>>>>      [5]    19  23     5
>>>>>>>>>
>>>>>>>>> We could to the same thing for the mapping from original to
>>>>>>>>> reduced
>>>>>>>>> ranges with e.g. an argument called 'with.mapping.attrib'.
>>>>>>>>> Would that work?
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> H.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 03/15/2012 05:44 AM, Kasper Daniel Hansen wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> So the key question is to what extent keeping track of where the
>>>>>>>>>> ranges comes from would slow down the reduce operation.  I am not
>>>>>>>>>> familiar enough with the algorithm to know this, but given how
>>>>>>>>>> fast
>>>>>>>>>> IRanges is in general, I am not one for guessing on this.
>>>>>>>>>>
>>>>>>>>>> I agree with Florian that this is a very typical use case.
>>>>>>>>>>
>>>>>>>>>> Kasper
>>>>>>>>>>
>>>>>>>>>> On Thu, Mar 15, 2012 at 5:02 AM, Hahne, Florian
>>>>>>>>>> <florian.hahne at novartis.com>       wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>> It is true that this is not terribly slow when you deal with
>>>>>>>>>>> fairly
>>>>>>>>>>> large
>>>>>>>>>>> range objects:
>>>>>>>>>>>
>>>>>>>>>>> foo<- GRanges(seqnames=sample(1:4, 1e6, TRUE),
>>>>>>>>>>> ranges=IRanges(start=as.**integer(runif(min=1, max=1e7, n=1e6)),
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> width=50))
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> system.time(bar<- reduce(foo))
>>>>>>>>>>>      user  system elapsed
>>>>>>>>>>>     0.918   0.174   1.091
>>>>>>>>>>>
>>>>>>>>>>> system.time(foobar<- findOverlaps(foo, bar))
>>>>>>>>>>>      user  system elapsed
>>>>>>>>>>>     2.051   0.402   2.453
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> However the whole process does take about 3x the time of just
>>>>>>>>>>> the
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> reduce
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> operation, and in my use case I want this to happen
>>>>>>>>>>> interactively,
>>>>>>>>>>> where
>>>>>>>>>>> waiting 3 seconds compared to 1 makes a huge difference...
>>>>>>>>>>>
>>>>>>>>>>> I wouldn't push this high up on the development agenda, but it
>>>>>>>>>>> seems
>>>>>>>>>>> to
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> be
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> something that is already 95% existing and could easily be
>>>>>>>>>>> added.
>>>>>>>>>>> But
>>>>>>>>>>> maybe I am wrong...
>>>>>>>>>>>
>>>>>>>>>>> Florian
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Florian Hahne
>>>>>>>>>>> Novartis Institute For Biomedical Research
>>>>>>>>>>> Translational Sciences / Preclinical Safety / PCS Informatics
>>>>>>>>>>> Expert Data Integration and Modeling Bioinformatics
>>>>>>>>>>> CHBS, WKL-135.2.26
>>>>>>>>>>> Novartis Institute For Biomedical Research, Werk Klybeck
>>>>>>>>>>> Klybeckstrasse 141
>>>>>>>>>>> CH-4057 Basel
>>>>>>>>>>> Switzerland
>>>>>>>>>>> Phone: +41 61 6967127
>>>>>>>>>>> Email : florian.hahne at novartis.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 3/14/12 9:40 PM, "Kasper Daniel
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Hansen"<kasperdanielhansen@**gmail.com
>>>>>>>>> <kasperdanielhansen at gmail.com>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>    We have discussed this a couple of times.  I routinely uses
>>>>>>>>>>> the
>>>>>>>>>>>> reduce
>>>>>>>>>>>> followed by findOverlaps paradigm.  As Malcolm says it feels
>>>>>>>>>>>> wrong,
>>>>>>>>>>>> but from a practical point of view it is pretty fast, so I
>>>>>>>>>>>> stopped
>>>>>>>>>>>> worrying about it.  I only think there is a reason to do this,
>>>>>>>>>>>> if it
>>>>>>>>>>>> is substantially faster.
>>>>>>>>>>>>
>>>>>>>>>>>> Kasper
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Mar 14, 2012 at 3:46 PM, Cook, Malcolm<MEC at stowers.org>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>> Chiming in....
>>>>>>>>>>>>>
>>>>>>>>>>>>> on a similar note....
>>>>>>>>>>>>>
>>>>>>>>>>>>> A version of `disjoin` which returns a Hits/RangesMapping
>>>>>>>>>>>>> additional
>>>>>>>>>>>>> to
>>>>>>>>>>>>> the GRanges result would be most useful  and probably not
>>>>>>>>>>>>> require
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>> much
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>> additional effort (assuming `disjoin` computes this
>>>>>>>>>>>>> internally)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Of course, it is easy to live without since I can just perform
>>>>>>>>>>>>> the
>>>>>>>>>>>>> findOverlaps myself after the disjoin.... it just "feels
>>>>>>>>>>>>> wrong"
>>>>>>>>>>>>> (tm)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ahoy!
>>>>>>>>>>>>>
>>>>>>>>>>>>> ~Malcolm
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    -----Original Message-----
>>>>>>>>>>>>>> From:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> bioc-devel-bounces at r-project.**org<bioc-devel-bounces at r-projec
>>>>>>>>>>>>>> t.
>>>>>>>>>>>>>> org>[mailto:
>>>>>>>>>>>>>> bioc-devel-
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>> bounces at r-
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>> project.org] On Behalf Of Hahne, Florian
>>>>>>>>>>>>>> Sent: Wednesday, March 14, 2012 2:22 PM
>>>>>>>>>>>>>> To: bioc-devel at r-project.org
>>>>>>>>>>>>>> Subject: [Bioc-devel] mapping between original and reduced
>>>>>>>>>>>>>> ranges
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This bounced before, guess the mailing list does not like
>>>>>>>>>>>>>> HTML
>>>>>>>>>>>>>> mails.
>>>>>>>>>>>>>> So
>>>>>>>>>>>>>> one more try:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I had the following offline discussion with Michael about how
>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>> could
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>> retain a mapping of the ranges in a GRanges object before and
>>>>>>>>>>>>>> after
>>>>>>>>>>>>>> reduce. He suggested to take it to the list. Is that
>>>>>>>>>>>>>> something
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> could
>>>>>>>>>>>>>> be added to GenomicRanges/IRanges?
>>>>>>>>>>>>>> Florian
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have a slightly tricky application for which I need to
>>>>>>>>>>>>>> reduce a
>>>>>>>>>>>>>> GRanges
>>>>>>>>>>>>>> object, but I would like to be able to process some of the
>>>>>>>>>>>>>> original
>>>>>>>>>>>>>> elementMetadata of the merged ranges later. The only way I
>>>>>>>>>>>>>> was
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>> able to
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>> figure out which of the original ranges correspond to the
>>>>>>>>>>>>>> merged
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>> ranges
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>> was to perform a findOverlaps operation, but of course that
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>> rather
>>>>>>>>>>>>>> costly. Is there a way to get the merge information out of
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> original
>>>>>>>>>>>>>> reduce call?
>>>>>>>>>>>>>> Here is a brief example:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> gr<- GRanges(seqnames="chr1",
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>> ranges=IRanges(start=c(1,6,12,**24,27),
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>> width=5), foo=1:5, bar=letters[1:5])
>>>>>>>>>>>>>> gr2<- reduce(gr, min.gapwidth=1)
>>>>>>>>>>>>>> ind<- queryHits(findOverlaps(gr2, gr))
>>>>>>>>>>>>>> split(values(gr), ind)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Unfortunately, this is the idiom. I could see an improvement
>>>>>>>>>>>>>> where
>>>>>>>>>>>>>> reduce
>>>>>>>>>>>>>> or a similarly named function would return a Hits object (in
>>>>>>>>>>>>>> addition
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> the actual reduce result) that would indicate the mapping
>>>>>>>>>>>>>> between
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>> input and reduced ranges. The RangesMapping structure would
>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>> really
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>> close to what we would need.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Michael
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ______________________________**_________________
>>>>>>>>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://sta
>>>>>>>>>>>>>> t.
>>>>>>>>>>>>>> ethz.ch/mailman/listinfo/bioc-devel>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ______________________________**_________________
>>>>>>>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat
>>>>>>>>>>>>> .e
>>>>>>>>>>>>> thz.ch/mailman/listinfo/bioc-devel>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> ______________________________**_________________
>>>>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.et
>>>>>>>>>> hz
>>>>>>>>>> .ch/mailman/listinfo/bioc-devel>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Herv? Pag?s
>>>>>>>>>
>>>>>>>>> Program in Computational Biology
>>>>>>>>> Division of Public Health Sciences
>>>>>>>>> Fred Hutchinson Cancer Research Center
>>>>>>>>> 1100 Fairview Ave. N, M1-B514
>>>>>>>>> P.O. Box 19024
>>>>>>>>> Seattle, WA 98109-1024
>>>>>>>>>
>>>>>>>>> E-mail: hpages at fhcrc.org
>>>>>>>>> Phone:  (206) 667-5791
>>>>>>>>> Fax:    (206) 667-1319
>>>>>>>>>
>>>>>>>>> ______________________________**_________________
>>>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.eth
>>>>>>>>> z.
>>>>>>>>> ch/mailman/listinfo/bioc-devel>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Herv? Pag?s
>>>>>>>
>>>>>>> Program in Computational Biology
>>>>>>> Division of Public Health Sciences
>>>>>>> Fred Hutchinson Cancer Research Center
>>>>>>> 1100 Fairview Ave. N, M1-B514
>>>>>>> P.O. Box 19024
>>>>>>> Seattle, WA 98109-1024
>>>>>>>
>>>>>>> E-mail: hpages at fhcrc.org
>>>>>>> Phone:  (206) 667-5791
>>>>>>> Fax:    (206) 667-1319
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Herv? Pag?s
>>>>>
>>>>> Program in Computational Biology
>>>>> Division of Public Health Sciences
>>>>> Fred Hutchinson Cancer Research Center
>>>>> 1100 Fairview Ave. N, M1-B514
>>>>> P.O. Box 19024
>>>>> Seattle, WA 98109-1024
>>>>>
>>>>> E-mail: hpages at fhcrc.org
>>>>> Phone:  (206) 667-5791
>>>>> Fax:    (206) 667-1319
>>>>>
>>>>> ______________________________**_________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>>
>>>>>
>>>>> https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch
>>>>> /m
>>>>> ailman/listinfo/bioc-devel>
>>>>>
>>>>
>>>> 	[[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
>>
>> --
>> Herv? Pag?s
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fhcrc.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>

-- 
Herv? Pag?s

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

------------------------------

_______________________________________________
Bioc-devel mailing list
Bioc-devel at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-devel

End of Bioc-devel Digest, Vol 96, Issue 20
******************************************