[Bioc-devel] VariantAnnotation::readVcf() sets the wrong seqlevelsStyle in devel

Hervé Pagès hp@ge@ @end|ng |rom |redhutch@org
Tue Aug 4 18:29:34 CEST 2020


Hi Robert,

The VCF file uses "22" for the chromosome name which is the name used by 
NCBI. So explicitly specifying "hg19" in the readVcf() call is like 
saying that this chromosome name is a UCSC name which is why 
seqlevelsStyle() gets confused later.

If you specify the name of the NCBI assembly, things work as expected:

   fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
   vcf <- readVcf(fl, "GRCh37")
   seqlevels(vcf)
   # [1] "22"
   seqlevelsStyle(vcf)
   # [1] "NCBI"
   seqlevelsStyle(vcf) <- "UCSC"
   seqlevels(vcf)
   # [1] "chr22"

Or, if you don't know what reference genome the file is based on, don't 
specify it:

   fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
   vcf <- readVcf(fl)
   seqlevels(vcf)
   # [1] "22"
   seqlevelsStyle(vcf)
   # [1] "NCBI"    "Ensembl"
   seqlevelsStyle(vcf) <- "UCSC"
   seqlevels(vcf)
   # [1] "chr22"

or specify it later:

   genome(vcf) <- "hg19"
   seqinfo(vcf)
   # Seqinfo object with 1 sequence from hg19 genome; no seqlengths:
   #   seqnames seqlengths isCircular genome
   #   chr22            NA         NA   hg19

Hope this helps,
H.


On 7/29/20 08:30, Robert Castelo wrote:
> hi,
> 
> it looks like either VariantAnnotation::readVcf() or something in the 
> CollapsedVCF class broke in devel with respect to reading and setting 
> sequence styles:
> 
> library(VariantAnnotation)
> 
> fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
> vcf <- readVcf(fl, "hg19")
> seqlevels(vcf)
> [1] "22"
> seqlevelsStyle(vcf)
> [1] "UCSC"
> seqlevelsStyle(vcf) <- "UCSC"
> seqlevels(vcf)
> [1] "22"
> 
> you can find my session information below. let me know if you want me to 
> open an issue at the GitHub repo (VariantAnnotatoin or GenomeInfoDb?).
> 
> thanks!
> 
> robert.
> 
> BiocManager::version()
> [1] ‘3.12’
> sessionInfo()
> R version 4.0.0 (2020-04-24)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 18.04.4 LTS
> 
> Matrix products: default
> BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
> 
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C
>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats4    parallel  stats     graphics  grDevices utils     datasets
> [8] methods   base
> 
> other attached packages:
>   [1] VariantAnnotation_1.35.3 Rsamtools_2.5.3
>   [3] Biostrings_2.57.2 XVector_0.29.3
>   [5] SummarizedExperiment_1.19.6 DelayedArray_0.15.7
>   [7] matrixStats_0.56.0 Matrix_1.2-18
>   [9] Biobase_2.49.0 GenomicRanges_1.41.5
> [11] GenomeInfoDb_1.25.8 IRanges_2.23.10
> [13] S4Vectors_0.27.12 BiocGenerics_0.35.4
> [15] BiocManager_1.30.10
> 
> loaded via a namespace (and not attached):
>   [1] progress_1.2.2           tidyselect_1.1.0 purrr_0.3.4
>   [4] lattice_0.20-41          vctrs_0.3.1 generics_0.0.2
>   [7] BiocFileCache_1.13.0     rtracklayer_1.49.4 GenomicFeatures_1.41.2
> [10] blob_1.2.1               XML_3.99-0.4 rlang_0.4.6
> [13] pillar_1.4.4             glue_1.4.1 DBI_1.1.0
> [16] rappdirs_0.3.1           BiocParallel_1.23.2 bit64_0.9-7.1
> [19] dbplyr_1.4.4             GenomeInfoDbData_1.2.3 lifecycle_0.2.0
> [22] stringr_1.4.0            zlibbioc_1.35.0 memoise_1.1.0
> [25] biomaRt_2.45.2           curl_4.3 AnnotationDbi_1.51.3
> [28] Rcpp_1.0.4.6             BSgenome_1.57.5 openssl_1.4.1
> [31] bit_1.1-15.2             hms_0.5.3 askpass_1.1
> [34] digest_0.6.25            stringi_1.4.6 dplyr_1.0.0
> [37] grid_4.0.0               tools_4.0.0 bitops_1.0-6
> [40] magrittr_1.5             RCurl_1.98-1.2 RSQLite_2.2.0
> [43] tibble_3.0.1             crayon_1.3.4 pkgconfig_2.0.3
> [46] ellipsis_0.3.1           prettyunits_1.1.1 assertthat_0.2.1
> [49] httr_1.4.1               R6_2.4.1 GenomicAlignments_1.25.3
> [52] compiler_4.0.0
> 
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=gp0KKC6W1uS1YnyFI5iSuxF5WSUpOhbHwL94GRP8yu0&s=Co1P5SErF64uPYhHMltM3De48hQLl-XHK3gfZOEnSKc&e= 
> 

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages using fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list