[Bioc-devel] A quick check for matching seqnames/order needed for Views on RleList?

Sean Davis seandavi at gmail.com
Sat Feb 21 11:41:29 CET 2015


Thanks, Hervé!

Sean

On Fri, Feb 20, 2015 at 9:34 PM, Hervé Pagès <hpages at fredhutch.org> wrote:

> Hi guys,
>
> Agreed. This is addressed in IRanges 2.1.40. Now you get a warning:
>
>   >  Views(RleList(b=21:24, a=11:16), RangesList(a=IRanges(2:3, 6),
> b=IRanges(1, 4)))
>   RleViewsList of length 2
>   names(2): a b
>   Warning message:
>   In RleViewsList(rleList = subject, rangesList = start) :
>     'rleList' was reordered so that its names match the names on
>     'rangesList'
>
> or an error:
>
>   >  Views(RleList(b=21:24, c=11:16), RangesList(a=IRanges(2:3, 6),
> b=IRanges(1, 4)))
>   Error in RleViewsList(rleList = subject, rangesList = start) :
>     when both 'rleList' and 'rangesList' have names, the set of names must
>     be the same on each object
>
> depending on the gravity of the situation.
>
> Thanks for your feedback.
> H.
>
>
>
> On 02/20/2015 07:38 AM, Kasper Daniel Hansen wrote:
>
>> Clearly the Views method has to handle that, and in my opinion return it
>> in
>> the order of cds list in this case.  Anything else is way too fragile.
>>
>> On Fri, Feb 20, 2015 at 8:14 AM, Sean Davis <seandavi at gmail.com> wrote:
>>
>>  On Fri, Feb 20, 2015 at 8:10 AM, Malcolm Perry <mgperry32 at gmail.com>
>>> wrote:
>>>
>>>  Hi Sean,
>>>>
>>>> The idiom I've most often seen for getting round this problem is to
>>>> first
>>>> find matching chromosomes in the RleList, and subset the RangesList
>>>> based
>>>> on this order:
>>>>
>>>> chrs = intersect(names(rle_list), as.character(seqlevels(windows)))
>>>> myViews = Views(rle_list[chrs], as(windows, "RangesList")[chrs])
>>>>
>>>>
>>> Thanks, Malcolm.
>>>
>>> This works just fine and fixes the problem.  However, I noticed that
>>> creating the Views() without doing what you suggest above does not result
>>> in a warning or error and results in an object that "works" but is
>>> nonsensical.  I simply want to protect myself from making the same
>>> mistake
>>> twice by some checking inside the Views() method.
>>>
>>> Sean
>>>
>>>
>>>
>>>
>>>> Hope this helps,
>>>>
>>>> Malcolm
>>>>
>>>> On Fri, Feb 20, 2015 at 12:53 PM, Sean Davis <seandavi at gmail.com>
>>>> wrote:
>>>>
>>>>  I am calculating coverage metrics of a BAM file on the CDS regions.
>>>>>
>>>> When
>>>
>>>> I
>>>>> form the RangesList and do coverage(), the resulting coverage vector
>>>>> applies the views to the regions from the RangesList without checking
>>>>> on
>>>>> matching ordering or seqlevels of the RleList and the RangesList.  This
>>>>> results, in this case, in Views from chr1 being applied to the coverage
>>>>> for
>>>>> chrM, for example.  Would it make sense to have the views method check
>>>>>
>>>> the
>>>
>>>> ordering and seqlevels (and perhaps even do the reordering, if
>>>>>
>>>> necessary)?
>>>
>>>>
>>>>> Example code showing the problem (not fully reproducible--sorry).
>>>>>
>>>>> Thanks,
>>>>> Sean
>>>>>
>>>>>
>>>>>  cdsreg
>>>>>>
>>>>> GRanges object with 237533 ranges and 1 metadata column:
>>>>>             seqnames               ranges strand   |    cds_id
>>>>>                <Rle>            <IRanges>  <Rle>   | <integer>
>>>>>         [1]     chr1     [ 12190,  12227]      +   |         1
>>>>>         [2]     chr1     [ 12595,  12721]      +   |         2
>>>>>         [3]     chr1     [ 13403,  13639]      +   |         3
>>>>>         [4]     chr1     [ 69091,  70008]      +   |         4
>>>>>         [5]     chr1     [324343, 324345]      +   |         5
>>>>>         ...      ...                  ...    ... ...       ...
>>>>>    [237529]     chrY [26959330, 26959332]      -   |    227333
>>>>>    [237530]     chrY [27184245, 27184263]      -   |    227334
>>>>>    [237531]     chrY [27184956, 27185061]      -   |    227335
>>>>>    [237532]     chrY [27187916, 27188033]      -   |    227336
>>>>>    [237533]     chrY [27190093, 27190170]      -   |    227337
>>>>>    -------
>>>>>    seqinfo: 93 sequences (1 circular) from hg19 genome
>>>>>
>>>>>> cdsrl = as(cdsreg,'RangesList')
>>>>>> names(cdsrl)
>>>>>>
>>>>>   [1] "chr1"                  "chr2"                  "chr3"
>>>>>   "chr4"
>>>>>   [5] "chr5"                  "chr6"                  "chr7"
>>>>>   "chr8"
>>>>>   [9] "chr9"                  "chr10"                 "chr11"
>>>>>    "chr12"
>>>>> [13] "chr13"                 "chr14"                 "chr15"
>>>>>    "chr16"
>>>>> [17] "chr17"                 "chr18"                 "chr19"
>>>>>    "chr20"
>>>>> [21] "chr21"                 "chr22"                 "chrX"
>>>>>   "chrY"
>>>>> [25] "chrM"                  "chr1_gl000191_random"
>>>>> "chr1_gl000192_random"
>>>>>   "chr4_ctg9_hap1"
>>>>> [29] "chr4_gl000193_random"  "chr4_gl000194_random"  "chr6_apd_hap1"
>>>>>    "chr6_cox_hap2"
>>>>> [33] "chr6_dbb_hap3"         "chr6_mann_hap4"        "chr6_mcf_hap5"
>>>>>    "chr6_qbl_hap6"
>>>>> [37] "chr6_ssto_hap7"        "chr7_gl000195_random"
>>>>> "chr8_gl000196_random"
>>>>>   "chr8_gl000197_random"
>>>>> [41] "chr9_gl000198_random"  "chr9_gl000199_random"
>>>>> "chr9_gl000200_random"
>>>>>   "chr9_gl000201_random"
>>>>> [45] "chr11_gl000202_random" "chr17_ctg5_hap1"
>>>>> "chr17_gl000203_random" "chr17_gl000204_random"
>>>>> [49] "chr17_gl000205_random" "chr17_gl000206_random"
>>>>> "chr18_gl000207_random" "chr19_gl000208_random"
>>>>> [53] "chr19_gl000209_random" "chr21_gl000210_random" "chrUn_gl000211"
>>>>>   "chrUn_gl000212"
>>>>> [57] "chrUn_gl000213"        "chrUn_gl000214"        "chrUn_gl000215"
>>>>>   "chrUn_gl000216"
>>>>> [61] "chrUn_gl000217"        "chrUn_gl000218"        "chrUn_gl000219"
>>>>>   "chrUn_gl000220"
>>>>> [65] "chrUn_gl000221"        "chrUn_gl000222"        "chrUn_gl000223"
>>>>>   "chrUn_gl000224"
>>>>> [69] "chrUn_gl000225"        "chrUn_gl000226"        "chrUn_gl000227"
>>>>>   "chrUn_gl000228"
>>>>> [73] "chrUn_gl000229"        "chrUn_gl000230"        "chrUn_gl000231"
>>>>>   "chrUn_gl000232"
>>>>> [77] "chrUn_gl000233"        "chrUn_gl000234"        "chrUn_gl000235"
>>>>>   "chrUn_gl000236"
>>>>> [81] "chrUn_gl000237"        "chrUn_gl000238"        "chrUn_gl000239"
>>>>>   "chrUn_gl000240"
>>>>> [85] "chrUn_gl000241"        "chrUn_gl000242"        "chrUn_gl000243"
>>>>>   "chrUn_gl000244"
>>>>> [89] "chrUn_gl000245"        "chrUn_gl000246"        "chrUn_gl000247"
>>>>>   "chrUn_gl000248"
>>>>> [93] "chrUn_gl000249"
>>>>>
>>>>>> names(cov)
>>>>>>
>>>>>   [1] "chrM"                  "chr1"                  "chr2"
>>>>>   "chr3"
>>>>>   [5] "chr4"                  "chr5"                  "chr6"
>>>>>   "chr7"
>>>>>   [9] "chr8"                  "chr9"                  "chr10"
>>>>>    "chr11"
>>>>> [13] "chr12"                 "chr13"                 "chr14"
>>>>>    "chr15"
>>>>> [17] "chr16"                 "chr17"                 "chr18"
>>>>>    "chr19"
>>>>> [21] "chr20"                 "chr21"                 "chr22"
>>>>>    "chrX"
>>>>> [25] "chrY"                  "chr1_gl000191_random"
>>>>> "chr1_gl000192_random"
>>>>>   "chr4_ctg9_hap1"
>>>>> [29] "chr4_gl000193_random"  "chr4_gl000194_random"  "chr6_apd_hap1"
>>>>>    "chr6_cox_hap2"
>>>>> [33] "chr6_dbb_hap3"         "chr6_mann_hap4"        "chr6_mcf_hap5"
>>>>>    "chr6_qbl_hap6"
>>>>> [37] "chr6_ssto_hap7"        "chr7_gl000195_random"
>>>>> "chr8_gl000196_random"
>>>>>   "chr8_gl000197_random"
>>>>> [41] "chr9_gl000198_random"  "chr9_gl000199_random"
>>>>> "chr9_gl000200_random"
>>>>>   "chr9_gl000201_random"
>>>>> [45] "chr11_gl000202_random" "chr17_ctg5_hap1"
>>>>> "chr17_gl000203_random" "chr17_gl000204_random"
>>>>> [49] "chr17_gl000205_random" "chr17_gl000206_random"
>>>>> "chr18_gl000207_random" "chr19_gl000208_random"
>>>>> [53] "chr19_gl000209_random" "chr21_gl000210_random" "chrUn_gl000211"
>>>>>   "chrUn_gl000212"
>>>>> [57] "chrUn_gl000213"        "chrUn_gl000214"        "chrUn_gl000215"
>>>>>   "chrUn_gl000216"
>>>>> [61] "chrUn_gl000217"        "chrUn_gl000218"        "chrUn_gl000219"
>>>>>   "chrUn_gl000220"
>>>>> [65] "chrUn_gl000221"        "chrUn_gl000222"        "chrUn_gl000223"
>>>>>   "chrUn_gl000224"
>>>>> [69] "chrUn_gl000225"        "chrUn_gl000226"        "chrUn_gl000227"
>>>>>   "chrUn_gl000228"
>>>>> [73] "chrUn_gl000229"        "chrUn_gl000230"        "chrUn_gl000231"
>>>>>   "chrUn_gl000232"
>>>>> [77] "chrUn_gl000233"        "chrUn_gl000234"        "chrUn_gl000235"
>>>>>   "chrUn_gl000236"
>>>>> [81] "chrUn_gl000237"        "chrUn_gl000238"        "chrUn_gl000239"
>>>>>   "chrUn_gl000240"
>>>>> [85] "chrUn_gl000241"        "chrUn_gl000242"        "chrUn_gl000243"
>>>>>   "chrUn_gl000244"
>>>>> [89] "chrUn_gl000245"        "chrUn_gl000246"        "chrUn_gl000247"
>>>>>   "chrUn_gl000248"
>>>>> [93] "chrUn_gl000249"
>>>>>
>>>>>> covView = Views(cov,cdsrl)
>>>>>> covView[[1]]
>>>>>>
>>>>> Views on a 16571-length Rle subject
>>>>>
>>>>> views:
>>>>>              start       end width
>>>>>      [1]     12190     12227    38 [1367 1357 1363 1358 1347 1375 1381
>>>>>
>>>> 1379
>>>
>>>> 1381 1387 1385 1377 1382 1368 1363 ...]
>>>>>      [2]     12595     12721   127 [1410 1416 1414 1421 1430 1430 1428
>>>>>
>>>> 1432
>>>
>>>> 1428 1419 1421 1418 1426 1427 1439 ...]
>>>>>      [3]     13403     13639   237 [1476 1468 1460 1461 1465 1455 1448
>>>>>
>>>> 1448
>>>
>>>> 1442 1448 1460 1460 1458 1435 1440 ...]
>>>>>      [4]     69091     70008   918 [ ]
>>>>>      [5]    324343    324345     3 [ ]
>>>>>      [6]    324439    325605  1167 [ ]
>>>>>      [7]    324515    324686   172 [ ]
>>>>>      [8]    324719    325124   406 [ ]
>>>>>      [9]    325383    325605   223 [ ]
>>>>>      ...       ...       ...   ... ...
>>>>> [23550] 249149924 249150145   222 [ ]
>>>>> [23551] 249150487 249150533    47 [ ]
>>>>> [23552] 249150487 249150621   135 [ ]
>>>>> [23553] 249150713 249150761    49 [ ]
>>>>> [23554] 249151433 249151696   264 [ ]
>>>>> [23555] 249152027 249152058    32 [ ]
>>>>> [23556] 249152330 249152508   179 [ ]
>>>>> [23557] 249152330 249152520   191 [ ]
>>>>> [23558] 249152711 249152713     3 [ ]
>>>>>
>>>>>  sessionInfo()R Under development (unstable) (2014-11-18 r66997)
>>>>>>
>>>>>
>>>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>>>
>>>>> locale:
>>>>> [1] C
>>>>>
>>>>> attached base packages:
>>>>> [1] stats4    parallel  stats     graphics  grDevices utils
>>>>> datasets  methods   base
>>>>>
>>>>> other attached packages:
>>>>>   [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.0.0 GenomicFeatures_1.19.15
>>>>>   [3] AnnotationDbi_1.29.17                   Biobase_2.27.1
>>>>>   [5] GenomicAlignments_1.3.27                VariantAnnotation_1.13.24
>>>>>   [7] Rsamtools_1.19.26                       Biostrings_2.35.7
>>>>>   [9] XVector_0.7.3                           GenomicRanges_1.19.35
>>>>> [11] GenomeInfoDb_1.3.12                     IRanges_2.1.35
>>>>> [13] S4Vectors_0.5.17                        BiocGenerics_0.13.4
>>>>> [15] roxygen2_4.1.0                          BiocInstaller_1.17.5
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>>   [1] BBmisc_1.8          BSgenome_1.35.16    BatchJobs_1.5
>>>>> BiocParallel_1.1.12 DBI_0.3.1
>>>>>   [6] RCurl_1.95-4.5      RSQLite_1.0.0       Rcpp_0.11.4
>>>>> XML_3.98-1.1        base64enc_0.1-2
>>>>> [11] biomaRt_2.23.5      bitops_1.0-6        brew_1.0-6
>>>>> checkmate_1.5.1     codetools_0.2-10
>>>>> [16] digest_0.6.8        fail_1.2            foreach_1.4.2
>>>>> iterators_1.0.7     rtracklayer_1.27.7
>>>>> [21] sendmailR_1.2-1     stringr_0.6.2       tools_3.2.0
>>>>>   zlibbioc_1.13.0
>>>>>
>>>>>          [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>
>>>>>
>>>>
>>>>
>>>          [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fredhutch.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list