[Bioc-devel] A quick check for matching seqnames/order needed for Views on RleList?
Malcolm Perry
mgperry32 at gmail.com
Fri Feb 20 14:10:17 CET 2015
Hi Sean,
The idiom I've most often seen for getting round this problem is to first
find matching chromosomes in the RleList, and subset the RangesList based
on this order:
chrs = intersect(names(rle_list), as.character(seqlevels(windows)))
myViews = Views(rle_list[chrs], as(windows, "RangesList")[chrs])
Hope this helps,
Malcolm
On Fri, Feb 20, 2015 at 12:53 PM, Sean Davis <seandavi at gmail.com> wrote:
> I am calculating coverage metrics of a BAM file on the CDS regions. When I
> form the RangesList and do coverage(), the resulting coverage vector
> applies the views to the regions from the RangesList without checking on
> matching ordering or seqlevels of the RleList and the RangesList. This
> results, in this case, in Views from chr1 being applied to the coverage for
> chrM, for example. Would it make sense to have the views method check the
> ordering and seqlevels (and perhaps even do the reordering, if necessary)?
>
> Example code showing the problem (not fully reproducible--sorry).
>
> Thanks,
> Sean
>
>
> > cdsreg
> GRanges object with 237533 ranges and 1 metadata column:
> seqnames ranges strand | cds_id
> <Rle> <IRanges> <Rle> | <integer>
> [1] chr1 [ 12190, 12227] + | 1
> [2] chr1 [ 12595, 12721] + | 2
> [3] chr1 [ 13403, 13639] + | 3
> [4] chr1 [ 69091, 70008] + | 4
> [5] chr1 [324343, 324345] + | 5
> ... ... ... ... ... ...
> [237529] chrY [26959330, 26959332] - | 227333
> [237530] chrY [27184245, 27184263] - | 227334
> [237531] chrY [27184956, 27185061] - | 227335
> [237532] chrY [27187916, 27188033] - | 227336
> [237533] chrY [27190093, 27190170] - | 227337
> -------
> seqinfo: 93 sequences (1 circular) from hg19 genome
> > cdsrl = as(cdsreg,'RangesList')
> > names(cdsrl)
> [1] "chr1" "chr2" "chr3"
> "chr4"
> [5] "chr5" "chr6" "chr7"
> "chr8"
> [9] "chr9" "chr10" "chr11"
> "chr12"
> [13] "chr13" "chr14" "chr15"
> "chr16"
> [17] "chr17" "chr18" "chr19"
> "chr20"
> [21] "chr21" "chr22" "chrX"
> "chrY"
> [25] "chrM" "chr1_gl000191_random" "chr1_gl000192_random"
> "chr4_ctg9_hap1"
> [29] "chr4_gl000193_random" "chr4_gl000194_random" "chr6_apd_hap1"
> "chr6_cox_hap2"
> [33] "chr6_dbb_hap3" "chr6_mann_hap4" "chr6_mcf_hap5"
> "chr6_qbl_hap6"
> [37] "chr6_ssto_hap7" "chr7_gl000195_random" "chr8_gl000196_random"
> "chr8_gl000197_random"
> [41] "chr9_gl000198_random" "chr9_gl000199_random" "chr9_gl000200_random"
> "chr9_gl000201_random"
> [45] "chr11_gl000202_random" "chr17_ctg5_hap1"
> "chr17_gl000203_random" "chr17_gl000204_random"
> [49] "chr17_gl000205_random" "chr17_gl000206_random"
> "chr18_gl000207_random" "chr19_gl000208_random"
> [53] "chr19_gl000209_random" "chr21_gl000210_random" "chrUn_gl000211"
> "chrUn_gl000212"
> [57] "chrUn_gl000213" "chrUn_gl000214" "chrUn_gl000215"
> "chrUn_gl000216"
> [61] "chrUn_gl000217" "chrUn_gl000218" "chrUn_gl000219"
> "chrUn_gl000220"
> [65] "chrUn_gl000221" "chrUn_gl000222" "chrUn_gl000223"
> "chrUn_gl000224"
> [69] "chrUn_gl000225" "chrUn_gl000226" "chrUn_gl000227"
> "chrUn_gl000228"
> [73] "chrUn_gl000229" "chrUn_gl000230" "chrUn_gl000231"
> "chrUn_gl000232"
> [77] "chrUn_gl000233" "chrUn_gl000234" "chrUn_gl000235"
> "chrUn_gl000236"
> [81] "chrUn_gl000237" "chrUn_gl000238" "chrUn_gl000239"
> "chrUn_gl000240"
> [85] "chrUn_gl000241" "chrUn_gl000242" "chrUn_gl000243"
> "chrUn_gl000244"
> [89] "chrUn_gl000245" "chrUn_gl000246" "chrUn_gl000247"
> "chrUn_gl000248"
> [93] "chrUn_gl000249"
> > names(cov)
> [1] "chrM" "chr1" "chr2"
> "chr3"
> [5] "chr4" "chr5" "chr6"
> "chr7"
> [9] "chr8" "chr9" "chr10"
> "chr11"
> [13] "chr12" "chr13" "chr14"
> "chr15"
> [17] "chr16" "chr17" "chr18"
> "chr19"
> [21] "chr20" "chr21" "chr22"
> "chrX"
> [25] "chrY" "chr1_gl000191_random" "chr1_gl000192_random"
> "chr4_ctg9_hap1"
> [29] "chr4_gl000193_random" "chr4_gl000194_random" "chr6_apd_hap1"
> "chr6_cox_hap2"
> [33] "chr6_dbb_hap3" "chr6_mann_hap4" "chr6_mcf_hap5"
> "chr6_qbl_hap6"
> [37] "chr6_ssto_hap7" "chr7_gl000195_random" "chr8_gl000196_random"
> "chr8_gl000197_random"
> [41] "chr9_gl000198_random" "chr9_gl000199_random" "chr9_gl000200_random"
> "chr9_gl000201_random"
> [45] "chr11_gl000202_random" "chr17_ctg5_hap1"
> "chr17_gl000203_random" "chr17_gl000204_random"
> [49] "chr17_gl000205_random" "chr17_gl000206_random"
> "chr18_gl000207_random" "chr19_gl000208_random"
> [53] "chr19_gl000209_random" "chr21_gl000210_random" "chrUn_gl000211"
> "chrUn_gl000212"
> [57] "chrUn_gl000213" "chrUn_gl000214" "chrUn_gl000215"
> "chrUn_gl000216"
> [61] "chrUn_gl000217" "chrUn_gl000218" "chrUn_gl000219"
> "chrUn_gl000220"
> [65] "chrUn_gl000221" "chrUn_gl000222" "chrUn_gl000223"
> "chrUn_gl000224"
> [69] "chrUn_gl000225" "chrUn_gl000226" "chrUn_gl000227"
> "chrUn_gl000228"
> [73] "chrUn_gl000229" "chrUn_gl000230" "chrUn_gl000231"
> "chrUn_gl000232"
> [77] "chrUn_gl000233" "chrUn_gl000234" "chrUn_gl000235"
> "chrUn_gl000236"
> [81] "chrUn_gl000237" "chrUn_gl000238" "chrUn_gl000239"
> "chrUn_gl000240"
> [85] "chrUn_gl000241" "chrUn_gl000242" "chrUn_gl000243"
> "chrUn_gl000244"
> [89] "chrUn_gl000245" "chrUn_gl000246" "chrUn_gl000247"
> "chrUn_gl000248"
> [93] "chrUn_gl000249"
> > covView = Views(cov,cdsrl)
> > covView[[1]]
> Views on a 16571-length Rle subject
>
> views:
> start end width
> [1] 12190 12227 38 [1367 1357 1363 1358 1347 1375 1381 1379
> 1381 1387 1385 1377 1382 1368 1363 ...]
> [2] 12595 12721 127 [1410 1416 1414 1421 1430 1430 1428 1432
> 1428 1419 1421 1418 1426 1427 1439 ...]
> [3] 13403 13639 237 [1476 1468 1460 1461 1465 1455 1448 1448
> 1442 1448 1460 1460 1458 1435 1440 ...]
> [4] 69091 70008 918 [ ]
> [5] 324343 324345 3 [ ]
> [6] 324439 325605 1167 [ ]
> [7] 324515 324686 172 [ ]
> [8] 324719 325124 406 [ ]
> [9] 325383 325605 223 [ ]
> ... ... ... ... ...
> [23550] 249149924 249150145 222 [ ]
> [23551] 249150487 249150533 47 [ ]
> [23552] 249150487 249150621 135 [ ]
> [23553] 249150713 249150761 49 [ ]
> [23554] 249151433 249151696 264 [ ]
> [23555] 249152027 249152058 32 [ ]
> [23556] 249152330 249152508 179 [ ]
> [23557] 249152330 249152520 191 [ ]
> [23558] 249152711 249152713 3 [ ]
>
> > sessionInfo()R Under development (unstable) (2014-11-18 r66997)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats4 parallel stats graphics grDevices utils
> datasets methods base
>
> other attached packages:
> [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.0.0 GenomicFeatures_1.19.15
> [3] AnnotationDbi_1.29.17 Biobase_2.27.1
> [5] GenomicAlignments_1.3.27 VariantAnnotation_1.13.24
> [7] Rsamtools_1.19.26 Biostrings_2.35.7
> [9] XVector_0.7.3 GenomicRanges_1.19.35
> [11] GenomeInfoDb_1.3.12 IRanges_2.1.35
> [13] S4Vectors_0.5.17 BiocGenerics_0.13.4
> [15] roxygen2_4.1.0 BiocInstaller_1.17.5
>
> loaded via a namespace (and not attached):
> [1] BBmisc_1.8 BSgenome_1.35.16 BatchJobs_1.5
> BiocParallel_1.1.12 DBI_0.3.1
> [6] RCurl_1.95-4.5 RSQLite_1.0.0 Rcpp_0.11.4
> XML_3.98-1.1 base64enc_0.1-2
> [11] biomaRt_2.23.5 bitops_1.0-6 brew_1.0-6
> checkmate_1.5.1 codetools_0.2-10
> [16] digest_0.6.8 fail_1.2 foreach_1.4.2
> iterators_1.0.7 rtracklayer_1.27.7
> [21] sendmailR_1.2-1 stringr_0.6.2 tools_3.2.0
> zlibbioc_1.13.0
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list