[Bioc-devel] A quick check for matching seqnames/order needed for Views on RleList?
Sean Davis
seandavi at gmail.com
Sat Feb 21 11:41:29 CET 2015
Thanks, Hervé!
Sean
On Fri, Feb 20, 2015 at 9:34 PM, Hervé Pagès <hpages at fredhutch.org> wrote:
> Hi guys,
>
> Agreed. This is addressed in IRanges 2.1.40. Now you get a warning:
>
> > Views(RleList(b=21:24, a=11:16), RangesList(a=IRanges(2:3, 6),
> b=IRanges(1, 4)))
> RleViewsList of length 2
> names(2): a b
> Warning message:
> In RleViewsList(rleList = subject, rangesList = start) :
> 'rleList' was reordered so that its names match the names on
> 'rangesList'
>
> or an error:
>
> > Views(RleList(b=21:24, c=11:16), RangesList(a=IRanges(2:3, 6),
> b=IRanges(1, 4)))
> Error in RleViewsList(rleList = subject, rangesList = start) :
> when both 'rleList' and 'rangesList' have names, the set of names must
> be the same on each object
>
> depending on the gravity of the situation.
>
> Thanks for your feedback.
> H.
>
>
>
> On 02/20/2015 07:38 AM, Kasper Daniel Hansen wrote:
>
>> Clearly the Views method has to handle that, and in my opinion return it
>> in
>> the order of cds list in this case. Anything else is way too fragile.
>>
>> On Fri, Feb 20, 2015 at 8:14 AM, Sean Davis <seandavi at gmail.com> wrote:
>>
>> On Fri, Feb 20, 2015 at 8:10 AM, Malcolm Perry <mgperry32 at gmail.com>
>>> wrote:
>>>
>>> Hi Sean,
>>>>
>>>> The idiom I've most often seen for getting round this problem is to
>>>> first
>>>> find matching chromosomes in the RleList, and subset the RangesList
>>>> based
>>>> on this order:
>>>>
>>>> chrs = intersect(names(rle_list), as.character(seqlevels(windows)))
>>>> myViews = Views(rle_list[chrs], as(windows, "RangesList")[chrs])
>>>>
>>>>
>>> Thanks, Malcolm.
>>>
>>> This works just fine and fixes the problem. However, I noticed that
>>> creating the Views() without doing what you suggest above does not result
>>> in a warning or error and results in an object that "works" but is
>>> nonsensical. I simply want to protect myself from making the same
>>> mistake
>>> twice by some checking inside the Views() method.
>>>
>>> Sean
>>>
>>>
>>>
>>>
>>>> Hope this helps,
>>>>
>>>> Malcolm
>>>>
>>>> On Fri, Feb 20, 2015 at 12:53 PM, Sean Davis <seandavi at gmail.com>
>>>> wrote:
>>>>
>>>> I am calculating coverage metrics of a BAM file on the CDS regions.
>>>>>
>>>> When
>>>
>>>> I
>>>>> form the RangesList and do coverage(), the resulting coverage vector
>>>>> applies the views to the regions from the RangesList without checking
>>>>> on
>>>>> matching ordering or seqlevels of the RleList and the RangesList. This
>>>>> results, in this case, in Views from chr1 being applied to the coverage
>>>>> for
>>>>> chrM, for example. Would it make sense to have the views method check
>>>>>
>>>> the
>>>
>>>> ordering and seqlevels (and perhaps even do the reordering, if
>>>>>
>>>> necessary)?
>>>
>>>>
>>>>> Example code showing the problem (not fully reproducible--sorry).
>>>>>
>>>>> Thanks,
>>>>> Sean
>>>>>
>>>>>
>>>>> cdsreg
>>>>>>
>>>>> GRanges object with 237533 ranges and 1 metadata column:
>>>>> seqnames ranges strand | cds_id
>>>>> <Rle> <IRanges> <Rle> | <integer>
>>>>> [1] chr1 [ 12190, 12227] + | 1
>>>>> [2] chr1 [ 12595, 12721] + | 2
>>>>> [3] chr1 [ 13403, 13639] + | 3
>>>>> [4] chr1 [ 69091, 70008] + | 4
>>>>> [5] chr1 [324343, 324345] + | 5
>>>>> ... ... ... ... ... ...
>>>>> [237529] chrY [26959330, 26959332] - | 227333
>>>>> [237530] chrY [27184245, 27184263] - | 227334
>>>>> [237531] chrY [27184956, 27185061] - | 227335
>>>>> [237532] chrY [27187916, 27188033] - | 227336
>>>>> [237533] chrY [27190093, 27190170] - | 227337
>>>>> -------
>>>>> seqinfo: 93 sequences (1 circular) from hg19 genome
>>>>>
>>>>>> cdsrl = as(cdsreg,'RangesList')
>>>>>> names(cdsrl)
>>>>>>
>>>>> [1] "chr1" "chr2" "chr3"
>>>>> "chr4"
>>>>> [5] "chr5" "chr6" "chr7"
>>>>> "chr8"
>>>>> [9] "chr9" "chr10" "chr11"
>>>>> "chr12"
>>>>> [13] "chr13" "chr14" "chr15"
>>>>> "chr16"
>>>>> [17] "chr17" "chr18" "chr19"
>>>>> "chr20"
>>>>> [21] "chr21" "chr22" "chrX"
>>>>> "chrY"
>>>>> [25] "chrM" "chr1_gl000191_random"
>>>>> "chr1_gl000192_random"
>>>>> "chr4_ctg9_hap1"
>>>>> [29] "chr4_gl000193_random" "chr4_gl000194_random" "chr6_apd_hap1"
>>>>> "chr6_cox_hap2"
>>>>> [33] "chr6_dbb_hap3" "chr6_mann_hap4" "chr6_mcf_hap5"
>>>>> "chr6_qbl_hap6"
>>>>> [37] "chr6_ssto_hap7" "chr7_gl000195_random"
>>>>> "chr8_gl000196_random"
>>>>> "chr8_gl000197_random"
>>>>> [41] "chr9_gl000198_random" "chr9_gl000199_random"
>>>>> "chr9_gl000200_random"
>>>>> "chr9_gl000201_random"
>>>>> [45] "chr11_gl000202_random" "chr17_ctg5_hap1"
>>>>> "chr17_gl000203_random" "chr17_gl000204_random"
>>>>> [49] "chr17_gl000205_random" "chr17_gl000206_random"
>>>>> "chr18_gl000207_random" "chr19_gl000208_random"
>>>>> [53] "chr19_gl000209_random" "chr21_gl000210_random" "chrUn_gl000211"
>>>>> "chrUn_gl000212"
>>>>> [57] "chrUn_gl000213" "chrUn_gl000214" "chrUn_gl000215"
>>>>> "chrUn_gl000216"
>>>>> [61] "chrUn_gl000217" "chrUn_gl000218" "chrUn_gl000219"
>>>>> "chrUn_gl000220"
>>>>> [65] "chrUn_gl000221" "chrUn_gl000222" "chrUn_gl000223"
>>>>> "chrUn_gl000224"
>>>>> [69] "chrUn_gl000225" "chrUn_gl000226" "chrUn_gl000227"
>>>>> "chrUn_gl000228"
>>>>> [73] "chrUn_gl000229" "chrUn_gl000230" "chrUn_gl000231"
>>>>> "chrUn_gl000232"
>>>>> [77] "chrUn_gl000233" "chrUn_gl000234" "chrUn_gl000235"
>>>>> "chrUn_gl000236"
>>>>> [81] "chrUn_gl000237" "chrUn_gl000238" "chrUn_gl000239"
>>>>> "chrUn_gl000240"
>>>>> [85] "chrUn_gl000241" "chrUn_gl000242" "chrUn_gl000243"
>>>>> "chrUn_gl000244"
>>>>> [89] "chrUn_gl000245" "chrUn_gl000246" "chrUn_gl000247"
>>>>> "chrUn_gl000248"
>>>>> [93] "chrUn_gl000249"
>>>>>
>>>>>> names(cov)
>>>>>>
>>>>> [1] "chrM" "chr1" "chr2"
>>>>> "chr3"
>>>>> [5] "chr4" "chr5" "chr6"
>>>>> "chr7"
>>>>> [9] "chr8" "chr9" "chr10"
>>>>> "chr11"
>>>>> [13] "chr12" "chr13" "chr14"
>>>>> "chr15"
>>>>> [17] "chr16" "chr17" "chr18"
>>>>> "chr19"
>>>>> [21] "chr20" "chr21" "chr22"
>>>>> "chrX"
>>>>> [25] "chrY" "chr1_gl000191_random"
>>>>> "chr1_gl000192_random"
>>>>> "chr4_ctg9_hap1"
>>>>> [29] "chr4_gl000193_random" "chr4_gl000194_random" "chr6_apd_hap1"
>>>>> "chr6_cox_hap2"
>>>>> [33] "chr6_dbb_hap3" "chr6_mann_hap4" "chr6_mcf_hap5"
>>>>> "chr6_qbl_hap6"
>>>>> [37] "chr6_ssto_hap7" "chr7_gl000195_random"
>>>>> "chr8_gl000196_random"
>>>>> "chr8_gl000197_random"
>>>>> [41] "chr9_gl000198_random" "chr9_gl000199_random"
>>>>> "chr9_gl000200_random"
>>>>> "chr9_gl000201_random"
>>>>> [45] "chr11_gl000202_random" "chr17_ctg5_hap1"
>>>>> "chr17_gl000203_random" "chr17_gl000204_random"
>>>>> [49] "chr17_gl000205_random" "chr17_gl000206_random"
>>>>> "chr18_gl000207_random" "chr19_gl000208_random"
>>>>> [53] "chr19_gl000209_random" "chr21_gl000210_random" "chrUn_gl000211"
>>>>> "chrUn_gl000212"
>>>>> [57] "chrUn_gl000213" "chrUn_gl000214" "chrUn_gl000215"
>>>>> "chrUn_gl000216"
>>>>> [61] "chrUn_gl000217" "chrUn_gl000218" "chrUn_gl000219"
>>>>> "chrUn_gl000220"
>>>>> [65] "chrUn_gl000221" "chrUn_gl000222" "chrUn_gl000223"
>>>>> "chrUn_gl000224"
>>>>> [69] "chrUn_gl000225" "chrUn_gl000226" "chrUn_gl000227"
>>>>> "chrUn_gl000228"
>>>>> [73] "chrUn_gl000229" "chrUn_gl000230" "chrUn_gl000231"
>>>>> "chrUn_gl000232"
>>>>> [77] "chrUn_gl000233" "chrUn_gl000234" "chrUn_gl000235"
>>>>> "chrUn_gl000236"
>>>>> [81] "chrUn_gl000237" "chrUn_gl000238" "chrUn_gl000239"
>>>>> "chrUn_gl000240"
>>>>> [85] "chrUn_gl000241" "chrUn_gl000242" "chrUn_gl000243"
>>>>> "chrUn_gl000244"
>>>>> [89] "chrUn_gl000245" "chrUn_gl000246" "chrUn_gl000247"
>>>>> "chrUn_gl000248"
>>>>> [93] "chrUn_gl000249"
>>>>>
>>>>>> covView = Views(cov,cdsrl)
>>>>>> covView[[1]]
>>>>>>
>>>>> Views on a 16571-length Rle subject
>>>>>
>>>>> views:
>>>>> start end width
>>>>> [1] 12190 12227 38 [1367 1357 1363 1358 1347 1375 1381
>>>>>
>>>> 1379
>>>
>>>> 1381 1387 1385 1377 1382 1368 1363 ...]
>>>>> [2] 12595 12721 127 [1410 1416 1414 1421 1430 1430 1428
>>>>>
>>>> 1432
>>>
>>>> 1428 1419 1421 1418 1426 1427 1439 ...]
>>>>> [3] 13403 13639 237 [1476 1468 1460 1461 1465 1455 1448
>>>>>
>>>> 1448
>>>
>>>> 1442 1448 1460 1460 1458 1435 1440 ...]
>>>>> [4] 69091 70008 918 [ ]
>>>>> [5] 324343 324345 3 [ ]
>>>>> [6] 324439 325605 1167 [ ]
>>>>> [7] 324515 324686 172 [ ]
>>>>> [8] 324719 325124 406 [ ]
>>>>> [9] 325383 325605 223 [ ]
>>>>> ... ... ... ... ...
>>>>> [23550] 249149924 249150145 222 [ ]
>>>>> [23551] 249150487 249150533 47 [ ]
>>>>> [23552] 249150487 249150621 135 [ ]
>>>>> [23553] 249150713 249150761 49 [ ]
>>>>> [23554] 249151433 249151696 264 [ ]
>>>>> [23555] 249152027 249152058 32 [ ]
>>>>> [23556] 249152330 249152508 179 [ ]
>>>>> [23557] 249152330 249152520 191 [ ]
>>>>> [23558] 249152711 249152713 3 [ ]
>>>>>
>>>>> sessionInfo()R Under development (unstable) (2014-11-18 r66997)
>>>>>>
>>>>>
>>>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>>>
>>>>> locale:
>>>>> [1] C
>>>>>
>>>>> attached base packages:
>>>>> [1] stats4 parallel stats graphics grDevices utils
>>>>> datasets methods base
>>>>>
>>>>> other attached packages:
>>>>> [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.0.0 GenomicFeatures_1.19.15
>>>>> [3] AnnotationDbi_1.29.17 Biobase_2.27.1
>>>>> [5] GenomicAlignments_1.3.27 VariantAnnotation_1.13.24
>>>>> [7] Rsamtools_1.19.26 Biostrings_2.35.7
>>>>> [9] XVector_0.7.3 GenomicRanges_1.19.35
>>>>> [11] GenomeInfoDb_1.3.12 IRanges_2.1.35
>>>>> [13] S4Vectors_0.5.17 BiocGenerics_0.13.4
>>>>> [15] roxygen2_4.1.0 BiocInstaller_1.17.5
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>> [1] BBmisc_1.8 BSgenome_1.35.16 BatchJobs_1.5
>>>>> BiocParallel_1.1.12 DBI_0.3.1
>>>>> [6] RCurl_1.95-4.5 RSQLite_1.0.0 Rcpp_0.11.4
>>>>> XML_3.98-1.1 base64enc_0.1-2
>>>>> [11] biomaRt_2.23.5 bitops_1.0-6 brew_1.0-6
>>>>> checkmate_1.5.1 codetools_0.2-10
>>>>> [16] digest_0.6.8 fail_1.2 foreach_1.4.2
>>>>> iterators_1.0.7 rtracklayer_1.27.7
>>>>> [21] sendmailR_1.2-1 stringr_0.6.2 tools_3.2.0
>>>>> zlibbioc_1.13.0
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>
>>>>>
>>>>
>>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fredhutch.org
> Phone: (206) 667-5791
> Fax: (206) 667-1319
>
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list