[Bioc-devel] VariantAnnotation writeVcf problem
Valerie Obenchain
vobencha at fhcrc.org
Mon Dec 2 19:58:21 CET 2013
Files with a .vcf extension often get scrubbed from email because they
are interpreted as a 'vCard' file. (At least this has been my
experience.) Changing the file extension to something other than '.vcf'
usually solves the problem.
I was able to reproduce the error with the file you pasted in the
message. The bug was some old code looking for ":" in rownames. This was
a legacy check and is no longer necessary (I should have removed it some
time ago). Now fixed in release (1.8.7) and devel (1.19.16).
Thanks for persevering and reporting this bug.
Valerie
On 11/28/2013 02:13 AM, Becq, Jennifer wrote:
> Hi Valerie,
> The VCF that is causing the problem was at the bottom of my email, I can copy-paste it here again:
>
> ##fileformat=VCFv4.1
> #CHROM POS ID REF ALT QUAL FILTER INFO
> chr20 14855644 DEL:561590:0:1:0:0:0 C <DEL> . PASS .
> chr20 29627290 BND:81424:0:1:1:1 G [chr2:114173319[G . MaxDepth .
> chr20 35365307 BND:54200:0:1:0:1 T ]chr1:230941520]T . PASS .
> chr20 60520225 DEL:572151:1:1:6:4:0 AACGATGAGGAGCATCGCGGCTGTCTGCACCATGGGAGCCCCTTCTCACTGACAATGAGGAGCATTCAGAGTGTCTACACCGTGGCCACGCCTTCTCACCGATGCTGAGGAGCACCGAGACTGTCTGCACTGTGGCCGCCCCTTCTCACCG A .
> PASS .
> chr20 60520443 DEL:572151:1:1:6:4:1 GACTGTCTGCACCGTGGCCGCCCCTTCTCACTGACGATGAGGAGCACTGCGACTGTCTGCACCGTGGCCGCCCTTTCTGACTGATGATAAGGAACATTGCGACTGTCTGCACCGTGGCTGCCCCTTCTCACCAACGCTGAGGAGCACTGCAACCATCTGCA
> CCGTGGCCGCCCCTTCTCACCGATGATGAGGAACATTGAGACTGTCTGCCCCGTGGCTGCCCCTTCTCACCGATGCTGAGGAGCACTGTGACTGTCTGCACCATGGGAGCCCCTTCTCACTGACAATGAGGAGCATTCAGAGTGTCTACACCGTGGCCGCGCCTTCTCACCGATGCTGAGGAGCACCGAGACTGTCTGCACCGTGGCCG
> CCCCTTCTCACCGATGACGAGGAGCACTGCGA GC . PASS .
> chr20 60520937 DEL:572151:1:1:11:0:0 C <DEL> . PASS .
> chr20 61766068 DEL:572433:0:0:5:2:0 CAGAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCAGGGCCACAGGGGAGGCAGGGCCCAGAGAGGAGGCGGGGCCACAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCAGGGCCA
> C CGG . PASS .
> chr20 62063686 DUP:TANDEM:572544:0:0:8:0:0 T <DUP:TANDEM> . PASS .
>
>
>
>
>
> Thanks
> Jennifer
>
> Jennifer Becq
> Senior Bioinformatics Scientist
> Illumina Cambridge Ltd
> Tel: +44 (0) 1799 532300
> email: jbecq at illumina.com
>
>
>
> -----Original Message-----
> From: Valerie Obenchain [mailto:vobencha at fhcrc.org]
> Sent: 27 November 2013 21:17
> To: Becq, Jennifer; bioc-devel at r-project.org
> Subject: Re: VariantAnnotation writeVcf problem
>
> Hi,
>
> I can't reproduce this error. Here is a read/write example using a file from VariantAnnotation where the results are as expected.
>
> fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation") dest <- tempfile()
> vcf1 <- readVcf(fl, "hg19")
> > rownames(vcf1)
> [1] "rs6054257" "20:17330_T/A" "rs6040355" "20:1230237_T/."
> [5] "microsat1"
>
> writeVcf(vcf1, dest)
> vcf2 <- readVcf(dest, "hg19")
> > rownames(vcf2)
> [1] "rs6054257" "20:17330_T/A" "rs6040355" "20:1230237_T/."
> [5] "microsat1"
>
> I need a reproducible example in order to help. Is the vcf you're working with publicly available?
>
> Valerie
>
> On 11/27/2013 03:37 AM, Becq, Jennifer wrote:
>> Hi Valerie,
>>
>> Thank you for cc'ing my message.
>>
>> The "ID" values are removed when reading a VCF through readVcf() and re-writing it with writeVcf():
>>
>> V = readVcf("test.vcf", "hg19")
>> rownames(V)
>> [1] "DEL:561590:0:1:0:0:0" "BND:81424:0:1:1:1"
>> [3] "BND:54200:0:1:0:1" "DEL:572151:1:1:6:4:0"
>> [5] "DEL:572151:1:1:6:4:1" "DEL:572151:1:1:11:0:0"
>> [7] "DEL:572433:0:0:5:2:0" "DUP:TANDEM:572544:0:0:8:0:0"
>> writeVcf(V, "writeTest.vcf")
>> V2 = readVcf("writeTest.vcf", "hg19")
>> rownames(V2)
>> [1] "chr20:14855644_C/<DEL>"
>> [2] "chr20:29627290_G/[chr2:114173319[G"
>> [3] "chr20:35365307_T/]chr1:230941520]T"
>> [4] "chr20:60520225_AACGATGAGGAGCATCGCGGCTGTCTGCACCATGGGAGCCCCTTCTCACTGACAATGAGGAGCATTCAGAGTGTCTACACCGTGGCCACGCCTTCTCACCGATGCTGAGGAGCACCGAGACTGTCTGCACTGTGGCCGCCCCTTCTCACCG/A"
>> [5] "chr20:60520443_GACTGTCTGCACCGTGGCCGCCCCTTCTCACTGACGATGAGGAGCACTGCGACTGTCTGCACCGTGGCCGCCCTTTCTGACTGATGATAAGGAACATTGCGACTGTCTGCACCGTGGCTGCCCCTTCTCACCAACGCTGAGGAGCACTGCAACCATCTGCACCGTGGCCGCCCCTTCTCACCGATGATGAGGAACATTGAGACTGTCTGCCCCGTGGCTGCCCCTTCTCACCGATGCTGAGGAGCACTGTGACTGTCTGCACCATGGGAGCCCCTTCTCACTGACAATGAGGAGCATTCAGAGTGTCTACACCGTGGCCGCGCCTTCTCACCGATGCTGAGGAGCACCGAGACTGTCTGCACCGTGGCCGCCCCTTCTCACCGATGACGAGGAGCACTGCGA/GC"
>> [6] "chr20:60520937_C/<DEL>"
>> [7] "chr20:61766068_CAGAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCAGGGCCACAGGGGAGGCAGGGCCCAGAGAGGAGGCGGGGCCACAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCAGGGCCAC/CGG"
>> [8] "chr20:62063686_T/<DUP:TANDEM>"
>>
>> sessionInfo()
>> R version 3.0.2 Patched (2013-10-27 r64116)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils datasets methods
>> [8] base
>>
>> other attached packages:
>> [1] VariantAnnotation_1.8.6 Rsamtools_1.14.2 Biostrings_2.30.1
>> [4] GenomicRanges_1.14.3 XVector_0.2.0 IRanges_1.20.5
>> [7] BiocGenerics_0.8.0
>>
>> loaded via a namespace (and not attached):
>> [1] AnnotationDbi_1.24.0 Biobase_2.22.0 biomaRt_2.18.0
>> [4] bitops_1.0-6 BSgenome_1.30.0 DBI_0.2-7
>> [7] GenomicFeatures_1.14.2 RCurl_1.95-4.1 RSQLite_0.11.4
>> [10] rtracklayer_1.22.0 stats4_3.0.2 tools_3.0.2
>> [13] XML_3.98-1.1 zlibbioc_1.8.0
>>
>>
>> ***** With the following VCF test.vcf:
>>
>> ##fileformat=VCFv4.1
>> #CHROM POS ID REF ALT QUAL FILTER INFO
>> chr20 14855644 DEL:561590:0:1:0:0:0 C <DEL> . PASS .
>> chr20 29627290 BND:81424:0:1:1:1 G [chr2:114173319[G . MaxDepth .
>> chr20 35365307 BND:54200:0:1:0:1 T ]chr1:230941520]T . PASS .
>> chr20 60520225 DEL:572151:1:1:6:4:0 AACGATGAGGAGCATCGCGGCTGTCTGCACCATGGGAGCCCCTTCTCACTGACAATGAGGAGCATTCAGAGTGTCTACACCGTGGCCACGCCTTCTCACCGATGCTGAGGAGCACCGAGACTGTCTGCACTGTGGCCGCCCCTTCTCACCG A
>> . PASS .
>> chr20 60520443 DEL:572151:1:1:6:4:1 GACTGTCTGCACCGTGGCCGCCCCTTCTCACTGACGATGAGGAGCACTGCGACTGTCTGCACCGTGGCCGCCCTTTCTGACTGATGATAAGGAACATTGCGACTGTCTGCACCGTGGCTGCCCCTTCTCACCAACGCTGAGGAGCACTGCAACCATCTGC
>> ACCGTGGCCGCCCCTTCTCACCGATGATGAGGAACATTGAGACTGTCTGCCCCGTGGCTGCCCCTTCTCACCGATGCTGAGGAGCACTGTGACTGTCTGCACCATGGGAGCCCCTTCTCACTGACAATGAGGAGCATTCAGAGTGTCTACACCGTGGCCGCGCCTTCTCACCGATGCTGAGGAGCACCGAGACTGTCTGCACCGTGGC
>> CGCCCCTTCTCACCGATGACGAGGAGCACTGCGA GC . PASS .
>> chr20 60520937 DEL:572151:1:1:11:0:0 C <DEL> . PASS .
>> chr20 61766068 DEL:572433:0:0:5:2:0 CAGAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCAGGGCCACAGGGGAGGCAGGGCCCAGAGAGGAGGCGGGGCCACAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCAGGGCC
>> AC CGG . PASS .
>> chr20 62063686 DUP:TANDEM:572544:0:0:8:0:0 T <DUP:TANDEM> . PASS .
>>
>> Thanks
>> Jennifer
>>
>>
>> Jennifer Becq
>> Bioinformatics Scientist
>> Illumina Cambridge Ltd
>> Tel: +44 (0) 1799 532300
>> email: jbecq at illumina.com
>>
>>
>>
>> -----Original Message-----
>> From: Valerie Obenchain [mailto:vobencha at fhcrc.org]
>> Sent: 20 November 2013 17:28
>> To: Becq, Jennifer; bioc-devel at r-project.org
>> Subject: Re: VariantAnnotation writeVcf problem
>>
>> Hi Jennifer,
>>
>> I've cc'd your message to the Bioconductor mailing list. We have two
>> lists, one for general questions and the other for bug reports/feature
>> requests. Please post future questions to one of these lists instead
>> of sending them to a single person. The lists reach a wider audience
>> and others can chime in with their responses/experience. You can find
>> info about the mailing lists here,
>>
>> http://www.bioconductor.org/help/mailing-list/
>>
>> writeVcf() should only write out '.' for ID if the ID is missing. There is no restriction on the format of the ID. Can you provide a small sample of the vcf file you're having trouble with (just a few lines is enough)? Also include the output of your sessionInfo().
>>
>> Valerie
>>
>>
>> On 11/15/2013 08:56 AM, Becq, Jennifer wrote:
>>> Hi Valerie,
>>>
>>> I've been using VariantAnnotation for quite a while now and it's been great!
>>>
>>> However I've just encountered a problem:
>>>
>>> If I read in a VCF and re-write it directly, the ID column has
>>> disappeared and becomes "." instead of the original
>>> "DEL:9586:0:1:0:0:0", even though the rownames of my VCF object are
>>> correctly populated with the original ID column.
>>>
>>> > library(VariantAnnotation)
>>>
>>> > in1 = readVcf("my.vcf.gz", "hg19")
>>>
>>> > writeVcf(in1, "test.vcf")
>>>
>>> I was wondering if that was because ID only accepts a specific format
>>> (rsID or chr:pos)?
>>>
>>> Thank you for your help
>>>
>>> Jennifer
>>>
>>> *Jennifer Becq*
>>>
>>> *Bioinformatics Scientist*
>>>
>>> *Illumina Cambridge Ltd*
>>>
>>> Tel: +44 (0) 1799 532300
>>>
>>> email: jbecq at illumina.com <mailto:jbecq at illumina.com>
>>>
>>
>
More information about the Bioc-devel
mailing list