[Bioc-devel] VariantAnnotation::readVcf() sets the wrong seqlevelsStyle in devel
    Hervé Pagès 
    hp@ge@ @end|ng |rom |redhutch@org
       
    Tue Aug  4 18:29:34 CEST 2020
    
    
  
Hi Robert,
The VCF file uses "22" for the chromosome name which is the name used by 
NCBI. So explicitly specifying "hg19" in the readVcf() call is like 
saying that this chromosome name is a UCSC name which is why 
seqlevelsStyle() gets confused later.
If you specify the name of the NCBI assembly, things work as expected:
   fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
   vcf <- readVcf(fl, "GRCh37")
   seqlevels(vcf)
   # [1] "22"
   seqlevelsStyle(vcf)
   # [1] "NCBI"
   seqlevelsStyle(vcf) <- "UCSC"
   seqlevels(vcf)
   # [1] "chr22"
Or, if you don't know what reference genome the file is based on, don't 
specify it:
   fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
   vcf <- readVcf(fl)
   seqlevels(vcf)
   # [1] "22"
   seqlevelsStyle(vcf)
   # [1] "NCBI"    "Ensembl"
   seqlevelsStyle(vcf) <- "UCSC"
   seqlevels(vcf)
   # [1] "chr22"
or specify it later:
   genome(vcf) <- "hg19"
   seqinfo(vcf)
   # Seqinfo object with 1 sequence from hg19 genome; no seqlengths:
   #   seqnames seqlengths isCircular genome
   #   chr22            NA         NA   hg19
Hope this helps,
H.
On 7/29/20 08:30, Robert Castelo wrote:
> hi,
> 
> it looks like either VariantAnnotation::readVcf() or something in the 
> CollapsedVCF class broke in devel with respect to reading and setting 
> sequence styles:
> 
> library(VariantAnnotation)
> 
> fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
> vcf <- readVcf(fl, "hg19")
> seqlevels(vcf)
> [1] "22"
> seqlevelsStyle(vcf)
> [1] "UCSC"
> seqlevelsStyle(vcf) <- "UCSC"
> seqlevels(vcf)
> [1] "22"
> 
> you can find my session information below. let me know if you want me to 
> open an issue at the GitHub repo (VariantAnnotatoin or GenomeInfoDb?).
> 
> thanks!
> 
> robert.
> 
> BiocManager::version()
> [1] ‘3.12’
> sessionInfo()
> R version 4.0.0 (2020-04-24)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 18.04.4 LTS
> 
> Matrix products: default
> BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
> 
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C
>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats4    parallel  stats     graphics  grDevices utils     datasets
> [8] methods   base
> 
> other attached packages:
>   [1] VariantAnnotation_1.35.3 Rsamtools_2.5.3
>   [3] Biostrings_2.57.2 XVector_0.29.3
>   [5] SummarizedExperiment_1.19.6 DelayedArray_0.15.7
>   [7] matrixStats_0.56.0 Matrix_1.2-18
>   [9] Biobase_2.49.0 GenomicRanges_1.41.5
> [11] GenomeInfoDb_1.25.8 IRanges_2.23.10
> [13] S4Vectors_0.27.12 BiocGenerics_0.35.4
> [15] BiocManager_1.30.10
> 
> loaded via a namespace (and not attached):
>   [1] progress_1.2.2           tidyselect_1.1.0 purrr_0.3.4
>   [4] lattice_0.20-41          vctrs_0.3.1 generics_0.0.2
>   [7] BiocFileCache_1.13.0     rtracklayer_1.49.4 GenomicFeatures_1.41.2
> [10] blob_1.2.1               XML_3.99-0.4 rlang_0.4.6
> [13] pillar_1.4.4             glue_1.4.1 DBI_1.1.0
> [16] rappdirs_0.3.1           BiocParallel_1.23.2 bit64_0.9-7.1
> [19] dbplyr_1.4.4             GenomeInfoDbData_1.2.3 lifecycle_0.2.0
> [22] stringr_1.4.0            zlibbioc_1.35.0 memoise_1.1.0
> [25] biomaRt_2.45.2           curl_4.3 AnnotationDbi_1.51.3
> [28] Rcpp_1.0.4.6             BSgenome_1.57.5 openssl_1.4.1
> [31] bit_1.1-15.2             hms_0.5.3 askpass_1.1
> [34] digest_0.6.25            stringi_1.4.6 dplyr_1.0.0
> [37] grid_4.0.0               tools_4.0.0 bitops_1.0-6
> [40] magrittr_1.5             RCurl_1.98-1.2 RSQLite_2.2.0
> [43] tibble_3.0.1             crayon_1.3.4 pkgconfig_2.0.3
> [46] ellipsis_0.3.1           prettyunits_1.1.1 assertthat_0.2.1
> [49] httr_1.4.1               R6_2.4.1 GenomicAlignments_1.25.3
> [52] compiler_4.0.0
> 
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=gp0KKC6W1uS1YnyFI5iSuxF5WSUpOhbHwL94GRP8yu0&s=Co1P5SErF64uPYhHMltM3De48hQLl-XHK3gfZOEnSKc&e= 
> 
-- 
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages using fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319
    
    
More information about the Bioc-devel
mailing list