[BioC] nearest() for GRanges

Cook, Malcolm MEC at stowers.org
Tue Jun 26 19:59:27 CEST 2012


Excellent.  All's well that ends well!  Thanks much.... cooking with gas
again....

--Malcolm


On 6/25/12 5:09 PM, "Dan Tenenbaum" <dtenenba at fhcrc.org> wrote:

>On Mon, Jun 25, 2012 at 2:53 PM, Cook, Malcolm <MEC at stowers.org> wrote:
>> Hi Valerie,
>>
>> Indeed good news.
>>
>> However, I am finding that this newest version is not yet available view
>> biocLite from repository at bioconductor.org.  I am still picking up
>>1.8.6
>> with biocLite('GenomicRanges').
>>
>> Should I expect to wait, or perhaps is there a 'push' at your end that
>> needs attending?
>>
>> Please advise if I'm expecting it to appear before its time ;)
>
>Out build cycle runs once a day, so expect to see the next version
>tomorrow morning around 10AM Seattle time. If you want to get it
>before then, you can check it out from the svn repository.
>
>Thanks,
>Dan
>
>
>>
>> Thanks!
>>
>> Malcolm
>>
>>
>>
>> On 6/25/12 1:53 PM, "Valerie Obenchain" <vobencha at fhcrc.org> wrote:
>>
>>>This is now fixed in release, BiocC 2.10, GenomicRanges 1.8.7.
>>>
>>>Note the behavior of '*' is different from previous behavior (i.e., <= v
>>>1.8.6). Treatment of '*' ranges was one of the aspects we clarified and
>>>enforced in the the recent update of precede, follows and nearest.
>>>
>>>Previously in release '*' was treated as a '+' range,
>>>
>>>g <- GRanges("chr1", IRanges(c(1,5,10), c(2,7,12)), "*")
>>> > g
>>>GRanges with 3 ranges and 0 elementMetadata cols:
>>>       seqnames    ranges strand
>>><Rle> <IRanges> <Rle>
>>>   [1]     chr1  [ 1,  2]      *
>>>   [2]     chr1  [ 5,  7]      *
>>>   [3]     chr1  [10, 12]      *
>>>   ---
>>>   seqlengths:
>>>    chr1
>>>      NA
>>> > precede(g)
>>>[1]  2  3 NA
>>> > follow(g)
>>>[1] NA  1  2
>>> > nearest(g)
>>>[1] 2 1 2
>>>
>>>
>>>The new behavior of '*' (in both release and devel) considers both '+'
>>>and '-' possibilities. For details see the 'matching by strand' section
>>>described in precede() on the man page for ?GRanges.
>>>
>>> > precede(g)
>>>[1] 2 1 2
>>> > follow(g)
>>>[1] 2 1 2
>>> > nearest(g)
>>>[1] 2 1 2
>>>
>>>
>>>Valerie
>>>
>>>On 06/22/2012 03:25 PM, Cook, Malcolm wrote:
>>>> Great news, Valerie... thanks very much... I will take immediate
>>>>advantage
>>>> of this... after re-reading your report of 'an overhaul' I would well
>>>> understand if back-porting your fix in dev to release would be onerous
>>>>to
>>>> impossible.
>>>>
>>>> I hope it goes quickly and smoothly....
>>>>
>>>> Cheers,
>>>>
>>>> Malcolm
>>>>
>>>>
>>>> On 6/22/12 4:00 PM, "Valerie Obenchain"<vobencha at fhcrc.org>  wrote:
>>>>
>>>>> On 06/20/2012 05:20 PM, Cook, Malcolm wrote:
>>>>>> Hi Valerie,
>>>>>>
>>>>>> Very glad you found and fixed the root cause.
>>>>>>
>>>>>> I don't know the overhead it would take for you, but, this being a
>>>>>> regression, might you consider fixing in Bioconductor 2.10 as, say
>>>>>> GenomicRanges_1.8.
>>>>>>
>>>>> Yes, I will fix this in release too. If not today then first thing
>>>>>next
>>>>> week.
>>>>>
>>>>> Valerie
>>>>>> Thanks for your consideration,
>>>>>>
>>>>>> Malcolm
>>>>>>
>>>>>> On 6/20/12 3:13 PM, "Valerie Obenchain"<vobencha at fhcrc.org>   wrote:
>>>>>>
>>>>>>> Hi Oleg, Malcom,
>>>>>>>
>>>>>>> Thanks for the bug report. This is now fixed in devel 1.9.28.  Over
>>>>>>>the
>>>>>>> past months we've done an overhaul of the precede/follow code in
>>>>>>>devel.
>>>>>>> The new nearest method is based on the new precede and follow and
>>>>>>>is
>>>>>>> documented at
>>>>>>>
>>>>>>> ?'nearest,GenomicRanges,GenomicRanges-method'
>>>>>>>
>>>>>>> Let me know if you run into problems.
>>>>>>>
>>>>>>> Valerie
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 06/18/2012 02:25 PM, Cook, Malcolm wrote:
>>>>>>>> Martin, Oleg, Val, all,
>>>>>>>>
>>>>>>>> I too have script logic that benefitted from and depends upon what
>>>>>>>>the
>>>>>>>> behavior of nearest,GenomicRanges,missing as reported by Oleg.
>>>>>>>>
>>>>>>>> Thanks for the unit tests Martin.
>>>>>>>>
>>>>>>>> If it helps in sleuthing, in my hands, the 3rd test used to pass
>>>>>>>>(if
>>>>>>>> my
>>>>>>>> memory serves), but does not pass now, as the attached transcript
>>>>>>>> shows.
>>>>>>>>
>>>>>>>> Hoping it helps find a speedy resolution to this issue,
>>>>>>>>
>>>>>>>> ~ Malcolm Cook
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>     r<- IRanges(c(1,5,10), c(2,7,12))
>>>>>>>>>     g<- GRanges("chr1", r, "+")
>>>>>>>>>     checkEquals(precede(r), precede(g))
>>>>>>>> [1] TRUE
>>>>>>>>>      checkEquals(follow(r), follow(g))
>>>>>>>> [1] TRUE
>>>>>>>>>     try(checkEquals(nearest(r), nearest(g)))
>>>>>>>> Error in checkEquals(nearest(r), nearest(g)) :
>>>>>>>>      Mean relative difference: 0.6
>>>>>>>>
>>>>>>>>
>>>>>>>>> sessionInfo()
>>>>>>>> R version 2.15.0 (2012-03-30)
>>>>>>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>>>>>>>
>>>>>>>> locale:
>>>>>>>> [1] C
>>>>>>>>
>>>>>>>> attached base packages:
>>>>>>>>     [1] tools     splines   parallel  stats     graphics
>>>>>>>>grDevices
>>>>>>>> utils
>>>>>>>> datasets  methods   base
>>>>>>>>
>>>>>>>> other attached packages:
>>>>>>>>     [1] RUnit_0.4.26          log4r_0.1-4           vwr_0.1
>>>>>>>> RecordLinkage_0.4-1   ffbase_0.5            ff_2.2-7
>>>>>>>> bit_1.1-8             evd_2.2-6             ipred_0.8-13
>>>>>>>> prodlim_1.3.1         KernSmooth_2.23-7     nnet_7.3-1
>>>>>>>> survival_2.36-14      mlbench_2.1-0         MASS_7.3-18
>>>>>>>> ada_2.0-2             rpart_3.1-53          e1071_1.6
>>>>>>>> class_7.3-3           XLConnect_0.1-9       XLConnectJars_0.1-4
>>>>>>>> rJava_0.9-3           latticeExtra_0.6-19   RColorBrewer_1.0-5
>>>>>>>> lattice_0.20-6        doMC_1.2.5            multicore_0.1-7
>>>>>>>> [28] BSgenome_1.24.0       rtracklayer_1.16.1    Rsamtools_1.8.5
>>>>>>>> Biostrings_2.24.1     GenomicFeatures_1.8.1 AnnotationDbi_1.18.1
>>>>>>>> GenomicRanges_1.8.6   IRanges_1.14.3        Biobase_2.16.0
>>>>>>>> BiocGenerics_0.2.0    data.table_1.8.0      compare_0.2-3
>>>>>>>> svUnit_0.7-10         doParallel_1.0.1      iterators_1.0.6
>>>>>>>> foreach_1.4.0         ggplot2_0.9.1         sqldf_0.4-6.4
>>>>>>>> RSQLite.extfuns_0.0.1 RSQLite_0.11.1        chron_2.3-42
>>>>>>>> gsubfn_0.6-3          proto_0.3-9.2         DBI_0.2-5
>>>>>>>> functional_0.1        reshape_0.8.4         plyr_1.7.1
>>>>>>>> [55] stringr_0.6           gtools_2.6.2
>>>>>>>>
>>>>>>>> loaded via a namespace (and not attached):
>>>>>>>>     [1] RCurl_1.91-1     XML_3.9-4        biomaRt_2.12.0
>>>>>>>> bitops_1.0-4.1
>>>>>>>> codetools_0.2-8  colorspace_1.1-1 compiler_2.15.0  dichromat_1.2-4
>>>>>>>> digest_0.5.2     grid_2.15.0      labeling_0.1     memoise_0.1
>>>>>>>> munsell_0.3      reshape2_1.2.1   scales_0.2.1     stats4_2.15.0
>>>>>>>> tcltk_2.15.0     zlibbioc_1.2.0
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 6/18/12 2:39 PM, "Martin Morgan"<mtmorgan at fhcrc.org>    wrote:
>>>>>>>>
>>>>>>>>> Hi Oleg --
>>>>>>>>>
>>>>>>>>> On 06/17/2012 11:11 PM, Oleg Mayba wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I just noticed that a piece of logic I was relying on with
>>>>>>>>>>GRanges
>>>>>>>>>> before
>>>>>>>>>> does not seem to work anymore.  Namely, I expect the behavior of
>>>>>>>>>> nearest()
>>>>>>>>>> with a single GRanges object as an argument to be the same as
>>>>>>>>>>that
>>>>>>>>>> of
>>>>>>>>>> IRanges (example below), but it's not anymore.  I expect
>>>>>>>>>> nearest(GR1)
>>>>>>>>>> NOT
>>>>>>>>>> to behave trivially but to return the closest range OTHER than
>>>>>>>>>>the
>>>>>>>>>> range
>>>>>>>>>> itself.  I could swear that was the case before, but isn't any
>>>>>>>>>> longer:
>>>>>>>>> I think you're right that there is an inconsistency here; Val
>>>>>>>>>will
>>>>>>>>> likely help clarify in a day or so. My two cents...
>>>>>>>>>
>>>>>>>>> I think, certainly, that GRanges on a single chromosome on the
>>>>>>>>>"+"
>>>>>>>>> strand should behave like an IRanges
>>>>>>>>>
>>>>>>>>>      library(GenomicRanges)
>>>>>>>>>      library(RUnit)
>>>>>>>>>
>>>>>>>>>      r<- IRanges(c(1,5,10), c(2,7,12))
>>>>>>>>>      g<- GRanges("chr1", r, "+")
>>>>>>>>>
>>>>>>>>>      ## first two ok, third should work but fails
>>>>>>>>>      checkEquals(precede(r), precede(g))
>>>>>>>>>      checkEquals(follow(r), follow(g))
>>>>>>>>>      try(checkEquals(nearest(r), nearest(g)))
>>>>>>>>>
>>>>>>>>> Also, on the "-" strand I think we're expecting
>>>>>>>>>
>>>>>>>>>      g<- GRanges("chr1", r, "-")
>>>>>>>>>
>>>>>>>>>      ## first two ok, third should work but fails
>>>>>>>>>      checkEquals(follow(r), precede(g))
>>>>>>>>>      checkEquals(precede(r), follow(g))
>>>>>>>>>      try(checkEquals(nearest(r), nearest(g)))
>>>>>>>>>
>>>>>>>>> For "*" (which was your example) I think the situation is (a)
>>>>>>>>> different
>>>>>>>>> in devel than in release; and (b) not so clear. In devel, "*" is
>>>>>>>>> (from
>>>>>>>>> method?"nearest,GenomicRanges,missing") "x on '*' strand can
>>>>>>>>>match
>>>>>>>>>to
>>>>>>>>> ranges on any of ''+'', ''-'' or ''*''" and in particular I think
>>>>>>>>> these
>>>>>>>>> are always true:
>>>>>>>>>
>>>>>>>>>      checkEquals(precede(g), follow(g))
>>>>>>>>>      checkEquals(nearest(r), follow(g))
>>>>>>>>>
>>>>>>>>> I would also expect
>>>>>>>>>
>>>>>>>>>      try(checkEquals(nearest(g), follow(g)))
>>>>>>>>>
>>>>>>>>> though this seems not to be the case. In 'release', "*" is
>>>>>>>>>coereced
>>>>>>>>> and
>>>>>>>>> behaves as if on the "+" strand (I think).
>>>>>>>>>
>>>>>>>>> Martin
>>>>>>>>>
>>>>>>>>>>> z=IRanges(start=c(1,5,10), end=c(2,7,12))
>>>>>>>>>>> z
>>>>>>>>>> IRanges of length 3
>>>>>>>>>>         start end width
>>>>>>>>>> [1]     1   2     2
>>>>>>>>>> [2]     5   7     3
>>>>>>>>>> [3]    10  12     3
>>>>>>>>>>> nearest(z)
>>>>>>>>>> [1] 2 1 2
>>>>>>>>>>> 
>>>>>>>>>>>z=GRanges(seqnames=rep('chr1',3),ranges=IRanges(start=c(1,5,10),
>>>>>>>>>> end=c(2,7,12)))
>>>>>>>>>>> z
>>>>>>>>>> GRanges with 3 ranges and 0 elementMetadata cols:
>>>>>>>>>>           seqnames    ranges strand
>>>>>>>>>>              <Rle>     <IRanges>      <Rle>
>>>>>>>>>>       [1]     chr1  [ 1,  2]      *
>>>>>>>>>>       [2]     chr1  [ 5,  7]      *
>>>>>>>>>>       [3]     chr1  [10, 12]      *
>>>>>>>>>>       ---
>>>>>>>>>>       seqlengths:
>>>>>>>>>>        chr1
>>>>>>>>>>          NA
>>>>>>>>>>> nearest(z)
>>>>>>>>>> [1] 1 2 3
>>>>>>>>>>> sessionInfo()
>>>>>>>>>> R version 2.15.0 (2012-03-30)
>>>>>>>>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>>>>>>>>
>>>>>>>>>> locale:
>>>>>>>>>>      [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>>>>>>>>      [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>>>>>>>>      [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>>>>>>>>>      [7] LC_PAPER=C                 LC_NAME=C
>>>>>>>>>>      [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>>>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>>>>>>>
>>>>>>>>>> attached base packages:
>>>>>>>>>> [1] datasets  utils     grDevices graphics  stats     methods
>>>>>>>>>>base
>>>>>>>>>>
>>>>>>>>>> other attached packages:
>>>>>>>>>> [1] GenomicRanges_1.8.6 IRanges_1.14.3      BiocGenerics_0.2.0
>>>>>>>>>>
>>>>>>>>>> loaded via a namespace (and not attached):
>>>>>>>>>> [1] stats4_2.15.0
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I want the IRanges behavior and not what seems currently to be
>>>>>>>>>>the
>>>>>>>>>> GRanges
>>>>>>>>>> behavior, since I have some code that depends on it. Is there a
>>>>>>>>>> quick
>>>>>>>>>> way
>>>>>>>>>> to make nearest() do that for me again?
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>> Oleg.
>>>>>>>>>>
>>>>>>>>>>        [[alternative HTML version deleted]]
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioconductor mailing list
>>>>>>>>>> Bioconductor at r-project.org
>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>>>>> Search the archives:
>>>>>>>>>> 
>>>>>>>>>>http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>>>> --
>>>>>>>>> Computational Biology / Fred Hutchinson Cancer Research Center
>>>>>>>>> 1100 Fairview Ave. N.
>>>>>>>>> PO Box 19024 Seattle, WA 98109
>>>>>>>>>
>>>>>>>>> Location: Arnold Building M1 B861
>>>>>>>>> Phone: (206) 667-2793
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioconductor mailing list
>>>>>>>>> Bioconductor at r-project.org
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>>>> Search the archives:
>>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>>> _______________________________________________
>>>>>>>> Bioconductor mailing list
>>>>>>>> Bioconductor at r-project.org
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>>> Search the archives:
>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>>http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list