[Bioc-devel] A quick check for matching seqnames/order needed for Views on RleList?

Malcolm Perry mgperry32 at gmail.com
Fri Feb 20 14:10:17 CET 2015


Hi Sean,

The idiom I've most often seen for getting round this problem is to first
find matching chromosomes in the RleList, and subset the RangesList based
on this order:

chrs = intersect(names(rle_list), as.character(seqlevels(windows)))
myViews = Views(rle_list[chrs], as(windows, "RangesList")[chrs])

Hope this helps,

Malcolm

On Fri, Feb 20, 2015 at 12:53 PM, Sean Davis <seandavi at gmail.com> wrote:

> I am calculating coverage metrics of a BAM file on the CDS regions.  When I
> form the RangesList and do coverage(), the resulting coverage vector
> applies the views to the regions from the RangesList without checking on
> matching ordering or seqlevels of the RleList and the RangesList.  This
> results, in this case, in Views from chr1 being applied to the coverage for
> chrM, for example.  Would it make sense to have the views method check the
> ordering and seqlevels (and perhaps even do the reordering, if necessary)?
>
> Example code showing the problem (not fully reproducible--sorry).
>
> Thanks,
> Sean
>
>
> > cdsreg
> GRanges object with 237533 ranges and 1 metadata column:
>            seqnames               ranges strand   |    cds_id
>               <Rle>            <IRanges>  <Rle>   | <integer>
>        [1]     chr1     [ 12190,  12227]      +   |         1
>        [2]     chr1     [ 12595,  12721]      +   |         2
>        [3]     chr1     [ 13403,  13639]      +   |         3
>        [4]     chr1     [ 69091,  70008]      +   |         4
>        [5]     chr1     [324343, 324345]      +   |         5
>        ...      ...                  ...    ... ...       ...
>   [237529]     chrY [26959330, 26959332]      -   |    227333
>   [237530]     chrY [27184245, 27184263]      -   |    227334
>   [237531]     chrY [27184956, 27185061]      -   |    227335
>   [237532]     chrY [27187916, 27188033]      -   |    227336
>   [237533]     chrY [27190093, 27190170]      -   |    227337
>   -------
>   seqinfo: 93 sequences (1 circular) from hg19 genome
> > cdsrl = as(cdsreg,'RangesList')
> > names(cdsrl)
>  [1] "chr1"                  "chr2"                  "chr3"
>  "chr4"
>  [5] "chr5"                  "chr6"                  "chr7"
>  "chr8"
>  [9] "chr9"                  "chr10"                 "chr11"
>   "chr12"
> [13] "chr13"                 "chr14"                 "chr15"
>   "chr16"
> [17] "chr17"                 "chr18"                 "chr19"
>   "chr20"
> [21] "chr21"                 "chr22"                 "chrX"
>  "chrY"
> [25] "chrM"                  "chr1_gl000191_random"  "chr1_gl000192_random"
>  "chr4_ctg9_hap1"
> [29] "chr4_gl000193_random"  "chr4_gl000194_random"  "chr6_apd_hap1"
>   "chr6_cox_hap2"
> [33] "chr6_dbb_hap3"         "chr6_mann_hap4"        "chr6_mcf_hap5"
>   "chr6_qbl_hap6"
> [37] "chr6_ssto_hap7"        "chr7_gl000195_random"  "chr8_gl000196_random"
>  "chr8_gl000197_random"
> [41] "chr9_gl000198_random"  "chr9_gl000199_random"  "chr9_gl000200_random"
>  "chr9_gl000201_random"
> [45] "chr11_gl000202_random" "chr17_ctg5_hap1"
> "chr17_gl000203_random" "chr17_gl000204_random"
> [49] "chr17_gl000205_random" "chr17_gl000206_random"
> "chr18_gl000207_random" "chr19_gl000208_random"
> [53] "chr19_gl000209_random" "chr21_gl000210_random" "chrUn_gl000211"
>  "chrUn_gl000212"
> [57] "chrUn_gl000213"        "chrUn_gl000214"        "chrUn_gl000215"
>  "chrUn_gl000216"
> [61] "chrUn_gl000217"        "chrUn_gl000218"        "chrUn_gl000219"
>  "chrUn_gl000220"
> [65] "chrUn_gl000221"        "chrUn_gl000222"        "chrUn_gl000223"
>  "chrUn_gl000224"
> [69] "chrUn_gl000225"        "chrUn_gl000226"        "chrUn_gl000227"
>  "chrUn_gl000228"
> [73] "chrUn_gl000229"        "chrUn_gl000230"        "chrUn_gl000231"
>  "chrUn_gl000232"
> [77] "chrUn_gl000233"        "chrUn_gl000234"        "chrUn_gl000235"
>  "chrUn_gl000236"
> [81] "chrUn_gl000237"        "chrUn_gl000238"        "chrUn_gl000239"
>  "chrUn_gl000240"
> [85] "chrUn_gl000241"        "chrUn_gl000242"        "chrUn_gl000243"
>  "chrUn_gl000244"
> [89] "chrUn_gl000245"        "chrUn_gl000246"        "chrUn_gl000247"
>  "chrUn_gl000248"
> [93] "chrUn_gl000249"
> > names(cov)
>  [1] "chrM"                  "chr1"                  "chr2"
>  "chr3"
>  [5] "chr4"                  "chr5"                  "chr6"
>  "chr7"
>  [9] "chr8"                  "chr9"                  "chr10"
>   "chr11"
> [13] "chr12"                 "chr13"                 "chr14"
>   "chr15"
> [17] "chr16"                 "chr17"                 "chr18"
>   "chr19"
> [21] "chr20"                 "chr21"                 "chr22"
>   "chrX"
> [25] "chrY"                  "chr1_gl000191_random"  "chr1_gl000192_random"
>  "chr4_ctg9_hap1"
> [29] "chr4_gl000193_random"  "chr4_gl000194_random"  "chr6_apd_hap1"
>   "chr6_cox_hap2"
> [33] "chr6_dbb_hap3"         "chr6_mann_hap4"        "chr6_mcf_hap5"
>   "chr6_qbl_hap6"
> [37] "chr6_ssto_hap7"        "chr7_gl000195_random"  "chr8_gl000196_random"
>  "chr8_gl000197_random"
> [41] "chr9_gl000198_random"  "chr9_gl000199_random"  "chr9_gl000200_random"
>  "chr9_gl000201_random"
> [45] "chr11_gl000202_random" "chr17_ctg5_hap1"
> "chr17_gl000203_random" "chr17_gl000204_random"
> [49] "chr17_gl000205_random" "chr17_gl000206_random"
> "chr18_gl000207_random" "chr19_gl000208_random"
> [53] "chr19_gl000209_random" "chr21_gl000210_random" "chrUn_gl000211"
>  "chrUn_gl000212"
> [57] "chrUn_gl000213"        "chrUn_gl000214"        "chrUn_gl000215"
>  "chrUn_gl000216"
> [61] "chrUn_gl000217"        "chrUn_gl000218"        "chrUn_gl000219"
>  "chrUn_gl000220"
> [65] "chrUn_gl000221"        "chrUn_gl000222"        "chrUn_gl000223"
>  "chrUn_gl000224"
> [69] "chrUn_gl000225"        "chrUn_gl000226"        "chrUn_gl000227"
>  "chrUn_gl000228"
> [73] "chrUn_gl000229"        "chrUn_gl000230"        "chrUn_gl000231"
>  "chrUn_gl000232"
> [77] "chrUn_gl000233"        "chrUn_gl000234"        "chrUn_gl000235"
>  "chrUn_gl000236"
> [81] "chrUn_gl000237"        "chrUn_gl000238"        "chrUn_gl000239"
>  "chrUn_gl000240"
> [85] "chrUn_gl000241"        "chrUn_gl000242"        "chrUn_gl000243"
>  "chrUn_gl000244"
> [89] "chrUn_gl000245"        "chrUn_gl000246"        "chrUn_gl000247"
>  "chrUn_gl000248"
> [93] "chrUn_gl000249"
> > covView = Views(cov,cdsrl)
> > covView[[1]]
> Views on a 16571-length Rle subject
>
> views:
>             start       end width
>     [1]     12190     12227    38 [1367 1357 1363 1358 1347 1375 1381 1379
> 1381 1387 1385 1377 1382 1368 1363 ...]
>     [2]     12595     12721   127 [1410 1416 1414 1421 1430 1430 1428 1432
> 1428 1419 1421 1418 1426 1427 1439 ...]
>     [3]     13403     13639   237 [1476 1468 1460 1461 1465 1455 1448 1448
> 1442 1448 1460 1460 1458 1435 1440 ...]
>     [4]     69091     70008   918 [ ]
>     [5]    324343    324345     3 [ ]
>     [6]    324439    325605  1167 [ ]
>     [7]    324515    324686   172 [ ]
>     [8]    324719    325124   406 [ ]
>     [9]    325383    325605   223 [ ]
>     ...       ...       ...   ... ...
> [23550] 249149924 249150145   222 [ ]
> [23551] 249150487 249150533    47 [ ]
> [23552] 249150487 249150621   135 [ ]
> [23553] 249150713 249150761    49 [ ]
> [23554] 249151433 249151696   264 [ ]
> [23555] 249152027 249152058    32 [ ]
> [23556] 249152330 249152508   179 [ ]
> [23557] 249152330 249152520   191 [ ]
> [23558] 249152711 249152713     3 [ ]
>
> > sessionInfo()R Under development (unstable) (2014-11-18 r66997)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats4    parallel  stats     graphics  grDevices utils
> datasets  methods   base
>
> other attached packages:
>  [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.0.0 GenomicFeatures_1.19.15
>  [3] AnnotationDbi_1.29.17                   Biobase_2.27.1
>  [5] GenomicAlignments_1.3.27                VariantAnnotation_1.13.24
>  [7] Rsamtools_1.19.26                       Biostrings_2.35.7
>  [9] XVector_0.7.3                           GenomicRanges_1.19.35
> [11] GenomeInfoDb_1.3.12                     IRanges_2.1.35
> [13] S4Vectors_0.5.17                        BiocGenerics_0.13.4
> [15] roxygen2_4.1.0                          BiocInstaller_1.17.5
>
> loaded via a namespace (and not attached):
>  [1] BBmisc_1.8          BSgenome_1.35.16    BatchJobs_1.5
> BiocParallel_1.1.12 DBI_0.3.1
>  [6] RCurl_1.95-4.5      RSQLite_1.0.0       Rcpp_0.11.4
> XML_3.98-1.1        base64enc_0.1-2
> [11] biomaRt_2.23.5      bitops_1.0-6        brew_1.0-6
> checkmate_1.5.1     codetools_0.2-10
> [16] digest_0.6.8        fail_1.2            foreach_1.4.2
> iterators_1.0.7     rtracklayer_1.27.7
> [21] sendmailR_1.2-1     stringr_0.6.2       tools_3.2.0
>  zlibbioc_1.13.0
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list