[BioC] nearest() for GRanges

Cook, Malcolm MEC at stowers.org
Mon Jun 18 23:25:18 CEST 2012


Martin, Oleg, Val, all,

I too have script logic that benefitted from and depends upon what the
behavior of nearest,GenomicRanges,missing as reported by Oleg.

Thanks for the unit tests Martin.

If it helps in sleuthing, in my hands, the 3rd test used to pass (if my
memory serves), but does not pass now, as the attached transcript shows.

Hoping it helps find a speedy resolution to this issue,

~ Malcolm Cook



>  r <- IRanges(c(1,5,10), c(2,7,12))
>  g <- GRanges("chr1", r, "+")
>  checkEquals(precede(r), precede(g))
[1] TRUE
>   checkEquals(follow(r), follow(g))
[1] TRUE
>  try(checkEquals(nearest(r), nearest(g)))
Error in checkEquals(nearest(r), nearest(g)) :
  Mean relative difference: 0.6


> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] C

attached base packages:
 [1] tools     splines   parallel  stats     graphics  grDevices utils
datasets  methods   base

other attached packages:
 [1] RUnit_0.4.26          log4r_0.1-4           vwr_0.1
RecordLinkage_0.4-1   ffbase_0.5            ff_2.2-7
bit_1.1-8             evd_2.2-6             ipred_0.8-13
prodlim_1.3.1         KernSmooth_2.23-7     nnet_7.3-1
survival_2.36-14      mlbench_2.1-0         MASS_7.3-18
ada_2.0-2             rpart_3.1-53          e1071_1.6
class_7.3-3           XLConnect_0.1-9       XLConnectJars_0.1-4
rJava_0.9-3           latticeExtra_0.6-19   RColorBrewer_1.0-5
lattice_0.20-6        doMC_1.2.5            multicore_0.1-7
[28] BSgenome_1.24.0       rtracklayer_1.16.1    Rsamtools_1.8.5
Biostrings_2.24.1     GenomicFeatures_1.8.1 AnnotationDbi_1.18.1
GenomicRanges_1.8.6   IRanges_1.14.3        Biobase_2.16.0
BiocGenerics_0.2.0    data.table_1.8.0      compare_0.2-3
svUnit_0.7-10         doParallel_1.0.1      iterators_1.0.6
foreach_1.4.0         ggplot2_0.9.1         sqldf_0.4-6.4
RSQLite.extfuns_0.0.1 RSQLite_0.11.1        chron_2.3-42
gsubfn_0.6-3          proto_0.3-9.2         DBI_0.2-5
functional_0.1        reshape_0.8.4         plyr_1.7.1
[55] stringr_0.6           gtools_2.6.2

loaded via a namespace (and not attached):
 [1] RCurl_1.91-1     XML_3.9-4        biomaRt_2.12.0   bitops_1.0-4.1
codetools_0.2-8  colorspace_1.1-1 compiler_2.15.0  dichromat_1.2-4
digest_0.5.2     grid_2.15.0      labeling_0.1     memoise_0.1
munsell_0.3      reshape2_1.2.1   scales_0.2.1     stats4_2.15.0
tcltk_2.15.0     zlibbioc_1.2.0






On 6/18/12 2:39 PM, "Martin Morgan" <mtmorgan at fhcrc.org> wrote:

>Hi Oleg --
>
>On 06/17/2012 11:11 PM, Oleg Mayba wrote:
>> Hi,
>>
>> I just noticed that a piece of logic I was relying on with GRanges
>>before
>> does not seem to work anymore.  Namely, I expect the behavior of
>>nearest()
>> with a single GRanges object as an argument to be the same as that of
>> IRanges (example below), but it's not anymore.  I expect nearest(GR1)
>>NOT
>> to behave trivially but to return the closest range OTHER than the range
>> itself.  I could swear that was the case before, but isn't any longer:
>
>I think you're right that there is an inconsistency here; Val will
>likely help clarify in a day or so. My two cents...
>
>I think, certainly, that GRanges on a single chromosome on the "+"
>strand should behave like an IRanges
>
>   library(GenomicRanges)
>   library(RUnit)
>
>   r <- IRanges(c(1,5,10), c(2,7,12))
>   g <- GRanges("chr1", r, "+")
>
>   ## first two ok, third should work but fails
>   checkEquals(precede(r), precede(g))
>   checkEquals(follow(r), follow(g))
>   try(checkEquals(nearest(r), nearest(g)))
>
>Also, on the "-" strand I think we're expecting
>
>   g <- GRanges("chr1", r, "-")
>
>   ## first two ok, third should work but fails
>   checkEquals(follow(r), precede(g))
>   checkEquals(precede(r), follow(g))
>   try(checkEquals(nearest(r), nearest(g)))
>
>For "*" (which was your example) I think the situation is (a) different
>in devel than in release; and (b) not so clear. In devel, "*" is (from
>method?"nearest,GenomicRanges,missing") "x on '*' strand can match to
>ranges on any of ''+'', ''-'' or ''*''" and in particular I think these
>are always true:
>
>   checkEquals(precede(g), follow(g))
>   checkEquals(nearest(r), follow(g))
>
>I would also expect
>
>   try(checkEquals(nearest(g), follow(g)))
>
>though this seems not to be the case. In 'release', "*" is coereced and
>behaves as if on the "+" strand (I think).
>
>Martin
>
>>
>>> z=IRanges(start=c(1,5,10), end=c(2,7,12))
>>> z
>> IRanges of length 3
>>      start end width
>> [1]     1   2     2
>> [2]     5   7     3
>> [3]    10  12     3
>>> nearest(z)
>> [1] 2 1 2
>>>
>>> z=GRanges(seqnames=rep('chr1',3),ranges=IRanges(start=c(1,5,10),
>> end=c(2,7,12)))
>>> z
>> GRanges with 3 ranges and 0 elementMetadata cols:
>>        seqnames    ranges strand
>>           <Rle>  <IRanges>   <Rle>
>>    [1]     chr1  [ 1,  2]      *
>>    [2]     chr1  [ 5,  7]      *
>>    [3]     chr1  [10, 12]      *
>>    ---
>>    seqlengths:
>>     chr1
>>       NA
>>> nearest(z)
>> [1] 1 2 3
>>>
>>> sessionInfo()
>> R version 2.15.0 (2012-03-30)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>   [7] LC_PAPER=C                 LC_NAME=C
>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] datasets  utils     grDevices graphics  stats     methods   base
>>
>> other attached packages:
>> [1] GenomicRanges_1.8.6 IRanges_1.14.3      BiocGenerics_0.2.0
>>
>> loaded via a namespace (and not attached):
>> [1] stats4_2.15.0
>>>
>>
>>
>>
>> I want the IRanges behavior and not what seems currently to be the
>>GRanges
>> behavior, since I have some code that depends on it. Is there a quick
>>way
>> to make nearest() do that for me again?
>>
>> Thanks!
>>
>> Oleg.
>>
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>>http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>-- 
>Computational Biology / Fred Hutchinson Cancer Research Center
>1100 Fairview Ave. N.
>PO Box 19024 Seattle, WA 98109
>
>Location: Arnold Building M1 B861
>Phone: (206) 667-2793
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at r-project.org
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list