[Bioc-devel] glitches with releaseName() and seqlevelsStyle()

Pages, Herve hp@ge@ @end|ng |rom |redhutch@org
Wed Sep 2 21:44:50 CEST 2020


Hi Robert,



On 9/2/20 11:12, Robert Castelo wrote:
> hi,
> 
> i've found the following glitches with 'releaseName()' and 
> 'seqlevelsStyle()', i guess due to recent changes in BSgenome and 
> GenomeInfoDb, which is why i'm cc'ing Hervé ;)
> 
> ===========BioC release 3.11 everything OK=============
> library(BSgenome.Hsapiens.UCSC.hg19)
> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
> 
> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
> 
> releaseName(Hsapiens)
> [1] "Genome Reference Consortium GRCh37.p13"
> releaseName(as(Hsapiens, "GenomeDescription"))
> [1] "Genome Reference Consortium GRCh37.p13"
> 
> seqlevelsStyle(txdb)
> [1] "UCSC"
> seqlevelsStyle(txdb) <- "NCBI"
> seqlevelsStyle(txdb)
> [1] "NCBI"    "Ensembl"
> head(seqlengths(txdb))
>          1         2         3         4         5 6
> 249250621 243199373 198022430 191154276 180915260 171115067
> =======================================================
> 
> ============BioC devel 3.12 inconsistencies============
> library(BSgenome.Hsapiens.UCSC.hg19)
> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
> 
> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
> 
> releaseName(Hsapiens)
> Error in (function (classes, fdef, mtable)  :
>    unable to find an inherited method for function ‘releaseName’ for 
> signature ‘"BSgenome"’
> releaseName(as(Hsapiens, "GenomeDescription"))
> [1] NA
> 
> seqlevelsStyle(txdb)
> [1] "UCSC"
> seqlevelsStyle(txdb) <- "NCBI"
> seqlevelsStyle(txdb)
> [1] "NCBI" "UCSC"
> head(seqlengths(txdb))
>          1         2         3         4         5 6
> 249250621 243199373 198022430 191154276 180915260 171115067
> ========================================================
> 
> in the case of 'releaseName()' i interpret that the method has become 
> defunct for 'BSgenome' objects, but i think it should give a sensible 
> result with a 'GenomeDescription' object, so something may be going 
> wrong with the coercion from 'BSgenome' to 'GenomeDescription'.

I'm dropping the notion of "release name" for BSgenome objects in BioC 
3.12. For backward compatibility I just restored a temporary 
releaseName() method (in BSgenome 1.57.6) that does the following:

 > releaseName(BSgenome.Hsapiens.UCSC.hg19)
[1] NA
Warning message:
   Starting with Bioconductor 3.12, BSgenome objects no longer have a
   "release name". As a consequence of this change calling releaseName()
   on a BSgenome object now always returns NA and is deprecated.

> 
> in the case of 'seqlevelsStyle()', while the change of style seems to 
> happen, i don't see how the style can be simultaneously NCBI and UCSC.

This happens when you end up with a mix of styles. In the case of hg19, 
all the sequences get renamed from UCSC to NCBI names **except** chrM:

   library(TxDb.Hsapiens.UCSC.hg19.knownGene)
   txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

   table(genome(txdb))
   # hg19
   #   93

   seqlevelsStyle(txdb) <- "NCBI"

   seqlevelsStyle(txdb)
   # [1] "NCBI" "UCSC"

   table(genome(txdb))
   # GRCh37.p13       hg19
   #         92          1

   genome(txdb)[genome(txdb) == "hg19"]
   #   chrM
   # "hg19"

chrM cannot and should not be renamed to MT because the MT sequence in 
GRCh37.p13 is NOT the same as the chrM sequence in hg19. So the new 
seqlevelsStyle() behavior fixes a long standing bug.

See previous discussion on this list about the long dance between UCSC 
and NCBI about the mitochondrian sequence in hg19/GRCh37/GRCh37.p13: 
https://stat.ethz.ch/pipermail/bioc-devel/2020-August/017086.html

Hope this helps,
H.

> 
> below my session information for the BioC devel 3.12 version.
> 
> thanks!!
> 
> robert.
> 
> sessionInfo()
> R version 4.0.0 (2020-04-24)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 18.04.4 LTS
> 
> Matrix products: default
> BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
> 
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C
>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats4    parallel  stats     graphics  grDevices utils     datasets
> [8] methods   base
> 
> other attached packages:
>   [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
>   [2] GenomicFeatures_1.41.2
>   [3] AnnotationDbi_1.51.3
>   [4] Biobase_2.49.0
>   [5] BSgenome.Hsapiens.UCSC.hg19_1.4.3
>   [6] BSgenome_1.57.5
>   [7] rtracklayer_1.49.5
>   [8] Biostrings_2.57.2
>   [9] XVector_0.29.3
> [10] GenomicRanges_1.41.6
> [11] GenomeInfoDb_1.25.10
> [12] IRanges_2.23.10
> [13] S4Vectors_0.27.12
> [14] BiocGenerics_0.35.4
> [15] BiocManager_1.30.10
> 
> loaded via a namespace (and not attached):
>   [1] SummarizedExperiment_1.19.6 progress_1.2.2
>   [3] tidyselect_1.1.0 purrr_0.3.4
>   [5] lattice_0.20-41 generics_0.0.2
>   [7] vctrs_0.3.1 BiocFileCache_1.13.1
>   [9] blob_1.2.1 XML_3.99-0.4
> [11] rlang_0.4.6 pillar_1.4.4
> [13] glue_1.4.1 DBI_1.1.0
> [15] rappdirs_0.3.1 BiocParallel_1.23.2
> [17] bit64_0.9-7.1 dbplyr_1.4.4
> [19] matrixStats_0.56.0 GenomeInfoDbData_1.2.3
> [21] lifecycle_0.2.0 stringr_1.4.0
> [23] zlibbioc_1.35.0 memoise_1.1.0
> [25] biomaRt_2.45.2 curl_4.3
> [27] Rcpp_1.0.4.6 openssl_1.4.1
> [29] DelayedArray_0.15.7 bit_1.1-15.2
> [31] Rsamtools_2.5.3 hms_0.5.3
> [33] askpass_1.1 digest_0.6.25
> [35] stringi_1.4.6 dplyr_1.0.0
> [37] grid_4.0.0 tools_4.0.0
> [39] bitops_1.0-6 magrittr_1.5
> [41] RCurl_1.98-1.2 RSQLite_2.2.0
> [43] tibble_3.0.1 crayon_1.3.4
> [45] pkgconfig_2.0.3 ellipsis_0.3.1
> [47] Matrix_1.2-18 prettyunits_1.1.1
> [49] assertthat_0.2.1 httr_1.4.1
> [51] R6_2.4.1 GenomicAlignments_1.25.3
> [53] compiler_4.0.0
> 

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages using fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319


More information about the Bioc-devel mailing list