[Bioc-devel] A quick check for matching seqnames/order needed for Views on RleList?
Hervé Pagès
hpages at fredhutch.org
Sat Feb 21 03:34:35 CET 2015
Hi guys,
Agreed. This is addressed in IRanges 2.1.40. Now you get a warning:
> Views(RleList(b=21:24, a=11:16), RangesList(a=IRanges(2:3, 6),
b=IRanges(1, 4)))
RleViewsList of length 2
names(2): a b
Warning message:
In RleViewsList(rleList = subject, rangesList = start) :
'rleList' was reordered so that its names match the names on
'rangesList'
or an error:
> Views(RleList(b=21:24, c=11:16), RangesList(a=IRanges(2:3, 6),
b=IRanges(1, 4)))
Error in RleViewsList(rleList = subject, rangesList = start) :
when both 'rleList' and 'rangesList' have names, the set of names must
be the same on each object
depending on the gravity of the situation.
Thanks for your feedback.
H.
On 02/20/2015 07:38 AM, Kasper Daniel Hansen wrote:
> Clearly the Views method has to handle that, and in my opinion return it in
> the order of cds list in this case. Anything else is way too fragile.
>
> On Fri, Feb 20, 2015 at 8:14 AM, Sean Davis <seandavi at gmail.com> wrote:
>
>> On Fri, Feb 20, 2015 at 8:10 AM, Malcolm Perry <mgperry32 at gmail.com>
>> wrote:
>>
>>> Hi Sean,
>>>
>>> The idiom I've most often seen for getting round this problem is to first
>>> find matching chromosomes in the RleList, and subset the RangesList based
>>> on this order:
>>>
>>> chrs = intersect(names(rle_list), as.character(seqlevels(windows)))
>>> myViews = Views(rle_list[chrs], as(windows, "RangesList")[chrs])
>>>
>>
>> Thanks, Malcolm.
>>
>> This works just fine and fixes the problem. However, I noticed that
>> creating the Views() without doing what you suggest above does not result
>> in a warning or error and results in an object that "works" but is
>> nonsensical. I simply want to protect myself from making the same mistake
>> twice by some checking inside the Views() method.
>>
>> Sean
>>
>>
>>
>>>
>>> Hope this helps,
>>>
>>> Malcolm
>>>
>>> On Fri, Feb 20, 2015 at 12:53 PM, Sean Davis <seandavi at gmail.com> wrote:
>>>
>>>> I am calculating coverage metrics of a BAM file on the CDS regions.
>> When
>>>> I
>>>> form the RangesList and do coverage(), the resulting coverage vector
>>>> applies the views to the regions from the RangesList without checking on
>>>> matching ordering or seqlevels of the RleList and the RangesList. This
>>>> results, in this case, in Views from chr1 being applied to the coverage
>>>> for
>>>> chrM, for example. Would it make sense to have the views method check
>> the
>>>> ordering and seqlevels (and perhaps even do the reordering, if
>> necessary)?
>>>>
>>>> Example code showing the problem (not fully reproducible--sorry).
>>>>
>>>> Thanks,
>>>> Sean
>>>>
>>>>
>>>>> cdsreg
>>>> GRanges object with 237533 ranges and 1 metadata column:
>>>> seqnames ranges strand | cds_id
>>>> <Rle> <IRanges> <Rle> | <integer>
>>>> [1] chr1 [ 12190, 12227] + | 1
>>>> [2] chr1 [ 12595, 12721] + | 2
>>>> [3] chr1 [ 13403, 13639] + | 3
>>>> [4] chr1 [ 69091, 70008] + | 4
>>>> [5] chr1 [324343, 324345] + | 5
>>>> ... ... ... ... ... ...
>>>> [237529] chrY [26959330, 26959332] - | 227333
>>>> [237530] chrY [27184245, 27184263] - | 227334
>>>> [237531] chrY [27184956, 27185061] - | 227335
>>>> [237532] chrY [27187916, 27188033] - | 227336
>>>> [237533] chrY [27190093, 27190170] - | 227337
>>>> -------
>>>> seqinfo: 93 sequences (1 circular) from hg19 genome
>>>>> cdsrl = as(cdsreg,'RangesList')
>>>>> names(cdsrl)
>>>> [1] "chr1" "chr2" "chr3"
>>>> "chr4"
>>>> [5] "chr5" "chr6" "chr7"
>>>> "chr8"
>>>> [9] "chr9" "chr10" "chr11"
>>>> "chr12"
>>>> [13] "chr13" "chr14" "chr15"
>>>> "chr16"
>>>> [17] "chr17" "chr18" "chr19"
>>>> "chr20"
>>>> [21] "chr21" "chr22" "chrX"
>>>> "chrY"
>>>> [25] "chrM" "chr1_gl000191_random"
>>>> "chr1_gl000192_random"
>>>> "chr4_ctg9_hap1"
>>>> [29] "chr4_gl000193_random" "chr4_gl000194_random" "chr6_apd_hap1"
>>>> "chr6_cox_hap2"
>>>> [33] "chr6_dbb_hap3" "chr6_mann_hap4" "chr6_mcf_hap5"
>>>> "chr6_qbl_hap6"
>>>> [37] "chr6_ssto_hap7" "chr7_gl000195_random"
>>>> "chr8_gl000196_random"
>>>> "chr8_gl000197_random"
>>>> [41] "chr9_gl000198_random" "chr9_gl000199_random"
>>>> "chr9_gl000200_random"
>>>> "chr9_gl000201_random"
>>>> [45] "chr11_gl000202_random" "chr17_ctg5_hap1"
>>>> "chr17_gl000203_random" "chr17_gl000204_random"
>>>> [49] "chr17_gl000205_random" "chr17_gl000206_random"
>>>> "chr18_gl000207_random" "chr19_gl000208_random"
>>>> [53] "chr19_gl000209_random" "chr21_gl000210_random" "chrUn_gl000211"
>>>> "chrUn_gl000212"
>>>> [57] "chrUn_gl000213" "chrUn_gl000214" "chrUn_gl000215"
>>>> "chrUn_gl000216"
>>>> [61] "chrUn_gl000217" "chrUn_gl000218" "chrUn_gl000219"
>>>> "chrUn_gl000220"
>>>> [65] "chrUn_gl000221" "chrUn_gl000222" "chrUn_gl000223"
>>>> "chrUn_gl000224"
>>>> [69] "chrUn_gl000225" "chrUn_gl000226" "chrUn_gl000227"
>>>> "chrUn_gl000228"
>>>> [73] "chrUn_gl000229" "chrUn_gl000230" "chrUn_gl000231"
>>>> "chrUn_gl000232"
>>>> [77] "chrUn_gl000233" "chrUn_gl000234" "chrUn_gl000235"
>>>> "chrUn_gl000236"
>>>> [81] "chrUn_gl000237" "chrUn_gl000238" "chrUn_gl000239"
>>>> "chrUn_gl000240"
>>>> [85] "chrUn_gl000241" "chrUn_gl000242" "chrUn_gl000243"
>>>> "chrUn_gl000244"
>>>> [89] "chrUn_gl000245" "chrUn_gl000246" "chrUn_gl000247"
>>>> "chrUn_gl000248"
>>>> [93] "chrUn_gl000249"
>>>>> names(cov)
>>>> [1] "chrM" "chr1" "chr2"
>>>> "chr3"
>>>> [5] "chr4" "chr5" "chr6"
>>>> "chr7"
>>>> [9] "chr8" "chr9" "chr10"
>>>> "chr11"
>>>> [13] "chr12" "chr13" "chr14"
>>>> "chr15"
>>>> [17] "chr16" "chr17" "chr18"
>>>> "chr19"
>>>> [21] "chr20" "chr21" "chr22"
>>>> "chrX"
>>>> [25] "chrY" "chr1_gl000191_random"
>>>> "chr1_gl000192_random"
>>>> "chr4_ctg9_hap1"
>>>> [29] "chr4_gl000193_random" "chr4_gl000194_random" "chr6_apd_hap1"
>>>> "chr6_cox_hap2"
>>>> [33] "chr6_dbb_hap3" "chr6_mann_hap4" "chr6_mcf_hap5"
>>>> "chr6_qbl_hap6"
>>>> [37] "chr6_ssto_hap7" "chr7_gl000195_random"
>>>> "chr8_gl000196_random"
>>>> "chr8_gl000197_random"
>>>> [41] "chr9_gl000198_random" "chr9_gl000199_random"
>>>> "chr9_gl000200_random"
>>>> "chr9_gl000201_random"
>>>> [45] "chr11_gl000202_random" "chr17_ctg5_hap1"
>>>> "chr17_gl000203_random" "chr17_gl000204_random"
>>>> [49] "chr17_gl000205_random" "chr17_gl000206_random"
>>>> "chr18_gl000207_random" "chr19_gl000208_random"
>>>> [53] "chr19_gl000209_random" "chr21_gl000210_random" "chrUn_gl000211"
>>>> "chrUn_gl000212"
>>>> [57] "chrUn_gl000213" "chrUn_gl000214" "chrUn_gl000215"
>>>> "chrUn_gl000216"
>>>> [61] "chrUn_gl000217" "chrUn_gl000218" "chrUn_gl000219"
>>>> "chrUn_gl000220"
>>>> [65] "chrUn_gl000221" "chrUn_gl000222" "chrUn_gl000223"
>>>> "chrUn_gl000224"
>>>> [69] "chrUn_gl000225" "chrUn_gl000226" "chrUn_gl000227"
>>>> "chrUn_gl000228"
>>>> [73] "chrUn_gl000229" "chrUn_gl000230" "chrUn_gl000231"
>>>> "chrUn_gl000232"
>>>> [77] "chrUn_gl000233" "chrUn_gl000234" "chrUn_gl000235"
>>>> "chrUn_gl000236"
>>>> [81] "chrUn_gl000237" "chrUn_gl000238" "chrUn_gl000239"
>>>> "chrUn_gl000240"
>>>> [85] "chrUn_gl000241" "chrUn_gl000242" "chrUn_gl000243"
>>>> "chrUn_gl000244"
>>>> [89] "chrUn_gl000245" "chrUn_gl000246" "chrUn_gl000247"
>>>> "chrUn_gl000248"
>>>> [93] "chrUn_gl000249"
>>>>> covView = Views(cov,cdsrl)
>>>>> covView[[1]]
>>>> Views on a 16571-length Rle subject
>>>>
>>>> views:
>>>> start end width
>>>> [1] 12190 12227 38 [1367 1357 1363 1358 1347 1375 1381
>> 1379
>>>> 1381 1387 1385 1377 1382 1368 1363 ...]
>>>> [2] 12595 12721 127 [1410 1416 1414 1421 1430 1430 1428
>> 1432
>>>> 1428 1419 1421 1418 1426 1427 1439 ...]
>>>> [3] 13403 13639 237 [1476 1468 1460 1461 1465 1455 1448
>> 1448
>>>> 1442 1448 1460 1460 1458 1435 1440 ...]
>>>> [4] 69091 70008 918 [ ]
>>>> [5] 324343 324345 3 [ ]
>>>> [6] 324439 325605 1167 [ ]
>>>> [7] 324515 324686 172 [ ]
>>>> [8] 324719 325124 406 [ ]
>>>> [9] 325383 325605 223 [ ]
>>>> ... ... ... ... ...
>>>> [23550] 249149924 249150145 222 [ ]
>>>> [23551] 249150487 249150533 47 [ ]
>>>> [23552] 249150487 249150621 135 [ ]
>>>> [23553] 249150713 249150761 49 [ ]
>>>> [23554] 249151433 249151696 264 [ ]
>>>> [23555] 249152027 249152058 32 [ ]
>>>> [23556] 249152330 249152508 179 [ ]
>>>> [23557] 249152330 249152520 191 [ ]
>>>> [23558] 249152711 249152713 3 [ ]
>>>>
>>>>> sessionInfo()R Under development (unstable) (2014-11-18 r66997)
>>>>
>>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>>
>>>> locale:
>>>> [1] C
>>>>
>>>> attached base packages:
>>>> [1] stats4 parallel stats graphics grDevices utils
>>>> datasets methods base
>>>>
>>>> other attached packages:
>>>> [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.0.0 GenomicFeatures_1.19.15
>>>> [3] AnnotationDbi_1.29.17 Biobase_2.27.1
>>>> [5] GenomicAlignments_1.3.27 VariantAnnotation_1.13.24
>>>> [7] Rsamtools_1.19.26 Biostrings_2.35.7
>>>> [9] XVector_0.7.3 GenomicRanges_1.19.35
>>>> [11] GenomeInfoDb_1.3.12 IRanges_2.1.35
>>>> [13] S4Vectors_0.5.17 BiocGenerics_0.13.4
>>>> [15] roxygen2_4.1.0 BiocInstaller_1.17.5
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] BBmisc_1.8 BSgenome_1.35.16 BatchJobs_1.5
>>>> BiocParallel_1.1.12 DBI_0.3.1
>>>> [6] RCurl_1.95-4.5 RSQLite_1.0.0 Rcpp_0.11.4
>>>> XML_3.98-1.1 base64enc_0.1-2
>>>> [11] biomaRt_2.23.5 bitops_1.0-6 brew_1.0-6
>>>> checkmate_1.5.1 codetools_0.2-10
>>>> [16] digest_0.6.8 fail_1.2 foreach_1.4.2
>>>> iterators_1.0.7 rtracklayer_1.27.7
>>>> [21] sendmailR_1.2-1 stringr_0.6.2 tools_3.2.0
>>>> zlibbioc_1.13.0
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>
>>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fredhutch.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-devel
mailing list