[Bioc-devel] A quick check for matching seqnames/order needed for Views on RleList?

Cook, Malcolm MEC at stowers.org
Mon Feb 23 18:11:11 CET 2015


Excellent news - will save the universe many hours of misunderstandings and code sleuthing.


-----Original Message-----
From: Bioc-devel [mailto:bioc-devel-bounces at r-project.org] On Behalf Of Hervé Pagès
Sent: Friday, February 20, 2015 8:35 PM
To: Kasper Daniel Hansen; Sean Davis
Cc: bioc-devel at r-project.org
Subject: Re: [Bioc-devel] A quick check for matching seqnames/order needed for Views on RleList?

Hi guys,

Agreed. This is addressed in IRanges 2.1.40. Now you get a warning:

   >  Views(RleList(b=21:24, a=11:16), RangesList(a=IRanges(2:3, 6), b=IRanges(1, 4)))
   RleViewsList of length 2
   names(2): a b
   Warning message:
   In RleViewsList(rleList = subject, rangesList = start) :
     'rleList' was reordered so that its names match the names on
     'rangesList'

or an error:

   >  Views(RleList(b=21:24, c=11:16), RangesList(a=IRanges(2:3, 6), b=IRanges(1, 4)))
   Error in RleViewsList(rleList = subject, rangesList = start) :
     when both 'rleList' and 'rangesList' have names, the set of names must
     be the same on each object

depending on the gravity of the situation.

Thanks for your feedback.
H.


On 02/20/2015 07:38 AM, Kasper Daniel Hansen wrote:
> Clearly the Views method has to handle that, and in my opinion return 
> it in the order of cds list in this case.  Anything else is way too fragile.
>
> On Fri, Feb 20, 2015 at 8:14 AM, Sean Davis <seandavi at gmail.com> wrote:
>
>> On Fri, Feb 20, 2015 at 8:10 AM, Malcolm Perry <mgperry32 at gmail.com>
>> wrote:
>>
>>> Hi Sean,
>>>
>>> The idiom I've most often seen for getting round this problem is to 
>>> first find matching chromosomes in the RleList, and subset the 
>>> RangesList based on this order:
>>>
>>> chrs = intersect(names(rle_list), as.character(seqlevels(windows))) 
>>> myViews = Views(rle_list[chrs], as(windows, "RangesList")[chrs])
>>>
>>
>> Thanks, Malcolm.
>>
>> This works just fine and fixes the problem.  However, I noticed that 
>> creating the Views() without doing what you suggest above does not 
>> result in a warning or error and results in an object that "works" 
>> but is nonsensical.  I simply want to protect myself from making the 
>> same mistake twice by some checking inside the Views() method.
>>
>> Sean
>>
>>
>>
>>>
>>> Hope this helps,
>>>
>>> Malcolm
>>>
>>> On Fri, Feb 20, 2015 at 12:53 PM, Sean Davis <seandavi at gmail.com> wrote:
>>>
>>>> I am calculating coverage metrics of a BAM file on the CDS regions.
>> When
>>>> I
>>>> form the RangesList and do coverage(), the resulting coverage 
>>>> vector applies the views to the regions from the RangesList without 
>>>> checking on matching ordering or seqlevels of the RleList and the 
>>>> RangesList.  This results, in this case, in Views from chr1 being 
>>>> applied to the coverage for chrM, for example.  Would it make sense 
>>>> to have the views method check
>> the
>>>> ordering and seqlevels (and perhaps even do the reordering, if
>> necessary)?
>>>>
>>>> Example code showing the problem (not fully reproducible--sorry).
>>>>
>>>> Thanks,
>>>> Sean
>>>>
>>>>
>>>>> cdsreg
>>>> GRanges object with 237533 ranges and 1 metadata column:
>>>>             seqnames               ranges strand   |    cds_id
>>>>                <Rle>            <IRanges>  <Rle>   | <integer>
>>>>         [1]     chr1     [ 12190,  12227]      +   |         1
>>>>         [2]     chr1     [ 12595,  12721]      +   |         2
>>>>         [3]     chr1     [ 13403,  13639]      +   |         3
>>>>         [4]     chr1     [ 69091,  70008]      +   |         4
>>>>         [5]     chr1     [324343, 324345]      +   |         5
>>>>         ...      ...                  ...    ... ...       ...
>>>>    [237529]     chrY [26959330, 26959332]      -   |    227333
>>>>    [237530]     chrY [27184245, 27184263]      -   |    227334
>>>>    [237531]     chrY [27184956, 27185061]      -   |    227335
>>>>    [237532]     chrY [27187916, 27188033]      -   |    227336
>>>>    [237533]     chrY [27190093, 27190170]      -   |    227337
>>>>    -------
>>>>    seqinfo: 93 sequences (1 circular) from hg19 genome
>>>>> cdsrl = as(cdsreg,'RangesList')
>>>>> names(cdsrl)
>>>>   [1] "chr1"                  "chr2"                  "chr3"
>>>>   "chr4"
>>>>   [5] "chr5"                  "chr6"                  "chr7"
>>>>   "chr8"
>>>>   [9] "chr9"                  "chr10"                 "chr11"
>>>>    "chr12"
>>>> [13] "chr13"                 "chr14"                 "chr15"
>>>>    "chr16"
>>>> [17] "chr17"                 "chr18"                 "chr19"
>>>>    "chr20"
>>>> [21] "chr21"                 "chr22"                 "chrX"
>>>>   "chrY"
>>>> [25] "chrM"                  "chr1_gl000191_random"
>>>> "chr1_gl000192_random"
>>>>   "chr4_ctg9_hap1"
>>>> [29] "chr4_gl000193_random"  "chr4_gl000194_random"  "chr6_apd_hap1"
>>>>    "chr6_cox_hap2"
>>>> [33] "chr6_dbb_hap3"         "chr6_mann_hap4"        "chr6_mcf_hap5"
>>>>    "chr6_qbl_hap6"
>>>> [37] "chr6_ssto_hap7"        "chr7_gl000195_random"
>>>> "chr8_gl000196_random"
>>>>   "chr8_gl000197_random"
>>>> [41] "chr9_gl000198_random"  "chr9_gl000199_random"
>>>> "chr9_gl000200_random"
>>>>   "chr9_gl000201_random"
>>>> [45] "chr11_gl000202_random" "chr17_ctg5_hap1"
>>>> "chr17_gl000203_random" "chr17_gl000204_random"
>>>> [49] "chr17_gl000205_random" "chr17_gl000206_random"
>>>> "chr18_gl000207_random" "chr19_gl000208_random"
>>>> [53] "chr19_gl000209_random" "chr21_gl000210_random" "chrUn_gl000211"
>>>>   "chrUn_gl000212"
>>>> [57] "chrUn_gl000213"        "chrUn_gl000214"        "chrUn_gl000215"
>>>>   "chrUn_gl000216"
>>>> [61] "chrUn_gl000217"        "chrUn_gl000218"        "chrUn_gl000219"
>>>>   "chrUn_gl000220"
>>>> [65] "chrUn_gl000221"        "chrUn_gl000222"        "chrUn_gl000223"
>>>>   "chrUn_gl000224"
>>>> [69] "chrUn_gl000225"        "chrUn_gl000226"        "chrUn_gl000227"
>>>>   "chrUn_gl000228"
>>>> [73] "chrUn_gl000229"        "chrUn_gl000230"        "chrUn_gl000231"
>>>>   "chrUn_gl000232"
>>>> [77] "chrUn_gl000233"        "chrUn_gl000234"        "chrUn_gl000235"
>>>>   "chrUn_gl000236"
>>>> [81] "chrUn_gl000237"        "chrUn_gl000238"        "chrUn_gl000239"
>>>>   "chrUn_gl000240"
>>>> [85] "chrUn_gl000241"        "chrUn_gl000242"        "chrUn_gl000243"
>>>>   "chrUn_gl000244"
>>>> [89] "chrUn_gl000245"        "chrUn_gl000246"        "chrUn_gl000247"
>>>>   "chrUn_gl000248"
>>>> [93] "chrUn_gl000249"
>>>>> names(cov)
>>>>   [1] "chrM"                  "chr1"                  "chr2"
>>>>   "chr3"
>>>>   [5] "chr4"                  "chr5"                  "chr6"
>>>>   "chr7"
>>>>   [9] "chr8"                  "chr9"                  "chr10"
>>>>    "chr11"
>>>> [13] "chr12"                 "chr13"                 "chr14"
>>>>    "chr15"
>>>> [17] "chr16"                 "chr17"                 "chr18"
>>>>    "chr19"
>>>> [21] "chr20"                 "chr21"                 "chr22"
>>>>    "chrX"
>>>> [25] "chrY"                  "chr1_gl000191_random"
>>>> "chr1_gl000192_random"
>>>>   "chr4_ctg9_hap1"
>>>> [29] "chr4_gl000193_random"  "chr4_gl000194_random"  "chr6_apd_hap1"
>>>>    "chr6_cox_hap2"
>>>> [33] "chr6_dbb_hap3"         "chr6_mann_hap4"        "chr6_mcf_hap5"
>>>>    "chr6_qbl_hap6"
>>>> [37] "chr6_ssto_hap7"        "chr7_gl000195_random"
>>>> "chr8_gl000196_random"
>>>>   "chr8_gl000197_random"
>>>> [41] "chr9_gl000198_random"  "chr9_gl000199_random"
>>>> "chr9_gl000200_random"
>>>>   "chr9_gl000201_random"
>>>> [45] "chr11_gl000202_random" "chr17_ctg5_hap1"
>>>> "chr17_gl000203_random" "chr17_gl000204_random"
>>>> [49] "chr17_gl000205_random" "chr17_gl000206_random"
>>>> "chr18_gl000207_random" "chr19_gl000208_random"
>>>> [53] "chr19_gl000209_random" "chr21_gl000210_random" "chrUn_gl000211"
>>>>   "chrUn_gl000212"
>>>> [57] "chrUn_gl000213"        "chrUn_gl000214"        "chrUn_gl000215"
>>>>   "chrUn_gl000216"
>>>> [61] "chrUn_gl000217"        "chrUn_gl000218"        "chrUn_gl000219"
>>>>   "chrUn_gl000220"
>>>> [65] "chrUn_gl000221"        "chrUn_gl000222"        "chrUn_gl000223"
>>>>   "chrUn_gl000224"
>>>> [69] "chrUn_gl000225"        "chrUn_gl000226"        "chrUn_gl000227"
>>>>   "chrUn_gl000228"
>>>> [73] "chrUn_gl000229"        "chrUn_gl000230"        "chrUn_gl000231"
>>>>   "chrUn_gl000232"
>>>> [77] "chrUn_gl000233"        "chrUn_gl000234"        "chrUn_gl000235"
>>>>   "chrUn_gl000236"
>>>> [81] "chrUn_gl000237"        "chrUn_gl000238"        "chrUn_gl000239"
>>>>   "chrUn_gl000240"
>>>> [85] "chrUn_gl000241"        "chrUn_gl000242"        "chrUn_gl000243"
>>>>   "chrUn_gl000244"
>>>> [89] "chrUn_gl000245"        "chrUn_gl000246"        "chrUn_gl000247"
>>>>   "chrUn_gl000248"
>>>> [93] "chrUn_gl000249"
>>>>> covView = Views(cov,cdsrl)
>>>>> covView[[1]]
>>>> Views on a 16571-length Rle subject
>>>>
>>>> views:
>>>>              start       end width
>>>>      [1]     12190     12227    38 [1367 1357 1363 1358 1347 1375 1381
>> 1379
>>>> 1381 1387 1385 1377 1382 1368 1363 ...]
>>>>      [2]     12595     12721   127 [1410 1416 1414 1421 1430 1430 1428
>> 1432
>>>> 1428 1419 1421 1418 1426 1427 1439 ...]
>>>>      [3]     13403     13639   237 [1476 1468 1460 1461 1465 1455 1448
>> 1448
>>>> 1442 1448 1460 1460 1458 1435 1440 ...]
>>>>      [4]     69091     70008   918 [ ]
>>>>      [5]    324343    324345     3 [ ]
>>>>      [6]    324439    325605  1167 [ ]
>>>>      [7]    324515    324686   172 [ ]
>>>>      [8]    324719    325124   406 [ ]
>>>>      [9]    325383    325605   223 [ ]
>>>>      ...       ...       ...   ... ...
>>>> [23550] 249149924 249150145   222 [ ]
>>>> [23551] 249150487 249150533    47 [ ]
>>>> [23552] 249150487 249150621   135 [ ]
>>>> [23553] 249150713 249150761    49 [ ]
>>>> [23554] 249151433 249151696   264 [ ]
>>>> [23555] 249152027 249152058    32 [ ]
>>>> [23556] 249152330 249152508   179 [ ]
>>>> [23557] 249152330 249152520   191 [ ]
>>>> [23558] 249152711 249152713     3 [ ]
>>>>
>>>>> sessionInfo()R Under development (unstable) (2014-11-18 r66997)
>>>>
>>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>>
>>>> locale:
>>>> [1] C
>>>>
>>>> attached base packages:
>>>> [1] stats4    parallel  stats     graphics  grDevices utils
>>>> datasets  methods   base
>>>>
>>>> other attached packages:
>>>>   [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.0.0 GenomicFeatures_1.19.15
>>>>   [3] AnnotationDbi_1.29.17                   Biobase_2.27.1
>>>>   [5] GenomicAlignments_1.3.27                VariantAnnotation_1.13.24
>>>>   [7] Rsamtools_1.19.26                       Biostrings_2.35.7
>>>>   [9] XVector_0.7.3                           GenomicRanges_1.19.35
>>>> [11] GenomeInfoDb_1.3.12                     IRanges_2.1.35
>>>> [13] S4Vectors_0.5.17                        BiocGenerics_0.13.4
>>>> [15] roxygen2_4.1.0                          BiocInstaller_1.17.5
>>>>
>>>> loaded via a namespace (and not attached):
>>>>   [1] BBmisc_1.8          BSgenome_1.35.16    BatchJobs_1.5
>>>> BiocParallel_1.1.12 DBI_0.3.1
>>>>   [6] RCurl_1.95-4.5      RSQLite_1.0.0       Rcpp_0.11.4
>>>> XML_3.98-1.1        base64enc_0.1-2
>>>> [11] biomaRt_2.23.5      bitops_1.0-6        brew_1.0-6
>>>> checkmate_1.5.1     codetools_0.2-10
>>>> [16] digest_0.6.8        fail_1.2            foreach_1.4.2
>>>> iterators_1.0.7     rtracklayer_1.27.7
>>>> [21] sendmailR_1.2-1     stringr_0.6.2       tools_3.2.0
>>>>   zlibbioc_1.13.0
>>>>
>>>>          [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list 
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>
>>>
>>
>>          [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list 
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list