[Bioc-devel] VariantAnnotation::readVcf() sets the wrong seqlevelsStyle in devel
Hervé Pagès
hp@ge@ @end|ng |rom |redhutch@org
Tue Aug 4 18:29:34 CEST 2020
Hi Robert,
The VCF file uses "22" for the chromosome name which is the name used by
NCBI. So explicitly specifying "hg19" in the readVcf() call is like
saying that this chromosome name is a UCSC name which is why
seqlevelsStyle() gets confused later.
If you specify the name of the NCBI assembly, things work as expected:
fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
vcf <- readVcf(fl, "GRCh37")
seqlevels(vcf)
# [1] "22"
seqlevelsStyle(vcf)
# [1] "NCBI"
seqlevelsStyle(vcf) <- "UCSC"
seqlevels(vcf)
# [1] "chr22"
Or, if you don't know what reference genome the file is based on, don't
specify it:
fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
vcf <- readVcf(fl)
seqlevels(vcf)
# [1] "22"
seqlevelsStyle(vcf)
# [1] "NCBI" "Ensembl"
seqlevelsStyle(vcf) <- "UCSC"
seqlevels(vcf)
# [1] "chr22"
or specify it later:
genome(vcf) <- "hg19"
seqinfo(vcf)
# Seqinfo object with 1 sequence from hg19 genome; no seqlengths:
# seqnames seqlengths isCircular genome
# chr22 NA NA hg19
Hope this helps,
H.
On 7/29/20 08:30, Robert Castelo wrote:
> hi,
>
> it looks like either VariantAnnotation::readVcf() or something in the
> CollapsedVCF class broke in devel with respect to reading and setting
> sequence styles:
>
> library(VariantAnnotation)
>
> fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
> vcf <- readVcf(fl, "hg19")
> seqlevels(vcf)
> [1] "22"
> seqlevelsStyle(vcf)
> [1] "UCSC"
> seqlevelsStyle(vcf) <- "UCSC"
> seqlevels(vcf)
> [1] "22"
>
> you can find my session information below. let me know if you want me to
> open an issue at the GitHub repo (VariantAnnotatoin or GenomeInfoDb?).
>
> thanks!
>
> robert.
>
> BiocManager::version()
> [1] ‘3.12’
> sessionInfo()
> R version 4.0.0 (2020-04-24)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 18.04.4 LTS
>
> Matrix products: default
> BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats4 parallel stats graphics grDevices utils datasets
> [8] methods base
>
> other attached packages:
> [1] VariantAnnotation_1.35.3 Rsamtools_2.5.3
> [3] Biostrings_2.57.2 XVector_0.29.3
> [5] SummarizedExperiment_1.19.6 DelayedArray_0.15.7
> [7] matrixStats_0.56.0 Matrix_1.2-18
> [9] Biobase_2.49.0 GenomicRanges_1.41.5
> [11] GenomeInfoDb_1.25.8 IRanges_2.23.10
> [13] S4Vectors_0.27.12 BiocGenerics_0.35.4
> [15] BiocManager_1.30.10
>
> loaded via a namespace (and not attached):
> [1] progress_1.2.2 tidyselect_1.1.0 purrr_0.3.4
> [4] lattice_0.20-41 vctrs_0.3.1 generics_0.0.2
> [7] BiocFileCache_1.13.0 rtracklayer_1.49.4 GenomicFeatures_1.41.2
> [10] blob_1.2.1 XML_3.99-0.4 rlang_0.4.6
> [13] pillar_1.4.4 glue_1.4.1 DBI_1.1.0
> [16] rappdirs_0.3.1 BiocParallel_1.23.2 bit64_0.9-7.1
> [19] dbplyr_1.4.4 GenomeInfoDbData_1.2.3 lifecycle_0.2.0
> [22] stringr_1.4.0 zlibbioc_1.35.0 memoise_1.1.0
> [25] biomaRt_2.45.2 curl_4.3 AnnotationDbi_1.51.3
> [28] Rcpp_1.0.4.6 BSgenome_1.57.5 openssl_1.4.1
> [31] bit_1.1-15.2 hms_0.5.3 askpass_1.1
> [34] digest_0.6.25 stringi_1.4.6 dplyr_1.0.0
> [37] grid_4.0.0 tools_4.0.0 bitops_1.0-6
> [40] magrittr_1.5 RCurl_1.98-1.2 RSQLite_2.2.0
> [43] tibble_3.0.1 crayon_1.3.4 pkgconfig_2.0.3
> [46] ellipsis_0.3.1 prettyunits_1.1.1 assertthat_0.2.1
> [49] httr_1.4.1 R6_2.4.1 GenomicAlignments_1.25.3
> [52] compiler_4.0.0
>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=gp0KKC6W1uS1YnyFI5iSuxF5WSUpOhbHwL94GRP8yu0&s=Co1P5SErF64uPYhHMltM3De48hQLl-XHK3gfZOEnSKc&e=
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages using fredhutch.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-devel
mailing list