[Bioc-devel] VariantAnnotation::readVcf(fl, seqinfo(scanVcfHeader(fl)) problem

Robert Castelo robert.castelo at upf.edu
Fri Oct 24 18:05:14 CEST 2014


hi Valerie,

thanks for the quick fix and updating the documentation, i have a 
further question about the seqinfo slot and particularly the use of 
seqlevelsStyle(). Let me illustrate it with an example again:


==============
library(VariantAnnotation)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)

## read again the same VCF file
fl <- file.path(system.file("extdata", package="VariantFiltering"), 
"CEUtrio.vcf.bgz")
vcf <- readVcf(fl, seqinfo(scanVcfHeader(fl)))

txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

## select the standard chromosomes
vcf <- keepStandardChromosomes(vcf)

## since the input VCF file had NCBI style, let's match
## the style of the TxDb annotations
seqlevelsStyle(vcf) <- seqlevelsStyle(txdb)

## drop the mitochondrial chromosome (b/c of the different lengths
## between b37 and hg19
vcf <- dropSeqlevels(vcf, "chrM")

## try to annotate the location of the variants. it prompts an
## error because the 'genome' slot of the Seqinfo object still
## has b37 after running seqlevelsStyle
vcf_annot <- locateVariants(vcf, txdb, AllVariants())
Error in mergeNamedAtomicVectors(genome(x), genome(y), what = 
c("sequence",  :
   sequences chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, 
chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, 
chr20, chr21, chr22, chrX, chrY, chrM have incompatible genomes:
   - in 'x': b37, b37, b37, b37, b37, b37, b37, b37, b37, b37, b37, b37, 
b37, b37, b37, b37, b37, b37, b37, b37, b37, b37, b37, b37, b37
   - in 'y': hg19, hg19, hg19, hg19, hg19, hg19, hg19, hg19, hg19, hg19, 
hg19, hg19, hg19, hg19, hg19, hg19, hg19, hg19, hg19, hg19, hg19, hg19, 
hg19, hg19, hg19

## this can be fixed by setting the 'genome' slot to the values of
## the TxDb object
genome(vcf) <- genome(txdb)[intersect(names(genome(vcf)), 
names(genome(txdb)))]

## now this works
vcf_annot <- locateVariants(vcf, txdb, AllVariants())
=================

so my question is, should not seqlevelsStyle() also change the 'genome' 
slot of the Seqinfo object in the updated object?

if not, would the solution be updating the 'genome' slot in the way i 
did it?

thanks!
robert.


On 10/23/2014 11:14 PM, Valerie Obenchain wrote:
> Hi Robert,
>
> Thanks for the bug report and reproducible example. Now fixed in release
> 1.12.2 and devel 1.13.4.
>
> I've also updated the docs to better explain how the Seqinfo objects are
> propagated / merged when supplied as 'genome'.
>
> Valerie
>
>
> On 10/23/2014 06:45 AM, Robert Castelo wrote:
>> hi there,
>>
>> in my package VariantFiltering i have an example VCF file from a Hapmap
>> CEU trio including three chromosomes only to illustrate its vignette.
>> i've come across a problem with the function readVcf() in
>> VariantAnnotation that may be specific of the situation of a VCF file
>> not having all chromosomes, but which it will be great for me if this
>> could be addressed.
>>
>> the problem is reproduced as follows:
>>
>> ===========================
>> library(VariantAnnotation)
>>
>> fl <- file.path(system.file("extdata", package="VariantFiltering"),
>> "CEUtrio.vcf.bgz")
>>
>> vcf <- readVcf(fl, seqinfo(scanVcfHeader(fl)))
>> Error in GenomeInfoDb:::makeNewSeqnames(x, new2old = new2old,
>> seqlevels(value)) :
>> when 'new2old' is NULL, the first elements in the
>> supplied 'seqlevels' must be identical to 'seqlevels(x)'
>> ====================
>>
>> this is caused because although i'm providing the Seqinfo object derived
>> from the header of the VCF file itself, at some point the ordering of
>> the seqlevels between the header and the rest of the VCF file differs
>> due to the smaller subset of chromosomes in the VCF file.
>>
>> This can be easily fixed by replacing the line:
>>
>> if (length(newsi) > length(oldsi)) {
>>
>> within the .scanVcfToVCF() function in methods-readVcf.R, by
>>
>> if (length(newsi) >= length(oldsi)) {
>>
>> this is happening both in release and devel. i'm pasting below my
>> sessionInfo() for the release.
>>
>> let me know if you think this fix is feasible or i'm wrongly using the
>> function readVcf(). i'm basically trying to use readVcf() without having
>> to figure out the appropriate value for the argument 'genome', i.e.,
>> without knowing beforehand what version of the genome was used to
>> produce the VCF file.
>>
>> thanks!!
>> robert.
>>
>>
>> sessionInfo()
>> R version 3.1.1 Patched (2014-10-13 r66751)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF8 LC_COLLATE=en_US.UTF8
>> [5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8
>> [7] LC_PAPER=en_US.UTF8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats4 parallel stats graphics grDevices
>> [6] utils datasets methods base
>>
>> other attached packages:
>> [1] VariantAnnotation_1.12.0 Rsamtools_1.18.0
>> [3] Biostrings_2.34.0 XVector_0.6.0
>> [5] GenomicRanges_1.18.0 GenomeInfoDb_1.2.0
>> [7] IRanges_2.0.0 S4Vectors_0.4.0
>> [9] BiocGenerics_0.12.0 vimcom_1.0-0
>> [11] setwidth_1.0-3 colorout_1.0-3
>>
>> loaded via a namespace (and not attached):
>> [1] AnnotationDbi_1.28.0 base64enc_0.1-2
>> [3] BatchJobs_1.4 BBmisc_1.7
>> [5] Biobase_2.26.0 BiocParallel_1.0.0
>> [7] biomaRt_2.22.0 bitops_1.0-6
>> [9] brew_1.0-6 BSgenome_1.34.0
>> [11] checkmate_1.4 codetools_0.2-9
>> [13] DBI_0.3.1 digest_0.6.4
>> [15] fail_1.2 foreach_1.4.2
>> [17] GenomicAlignments_1.2.0 GenomicFeatures_1.18.0
>> [19] iterators_1.0.7 RCurl_1.95-4.3
>> [21] RSQLite_0.11.4 rtracklayer_1.26.0
>> [23] sendmailR_1.2-1 stringr_0.6.2
>> [25] tools_3.1.1 XML_3.98-1.1
>> [27] zlibbioc_1.12.0
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>

-- 
Robert Castelo, PhD
Associate Professor
Dept. of Experimental and Health Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514
fax: +34.933.160.550



More information about the Bioc-devel mailing list