[Bioc-devel] VariantAnnotation writeVcf problem

Valerie Obenchain vobencha at fhcrc.org
Mon Dec 2 19:58:21 CET 2013


Files with a .vcf extension often get scrubbed from email because they 
are interpreted as a 'vCard' file. (At least this has been my 
experience.) Changing the file extension to something other than '.vcf' 
usually solves the problem.

I was able to reproduce the error with the file you pasted in the 
message. The bug was some old code looking for ":" in rownames. This was 
a legacy check and is no longer necessary (I should have removed it some 
time ago). Now fixed in release (1.8.7) and devel (1.19.16).

Thanks for persevering and reporting this bug.

Valerie


On 11/28/2013 02:13 AM, Becq, Jennifer wrote:
> Hi Valerie,
> The VCF that is causing the problem was at the bottom of my email, I can copy-paste it here again:
>
> ##fileformat=VCFv4.1
> #CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
> chr20	14855644	DEL:561590:0:1:0:0:0	C	<DEL>	.	PASS	.
> chr20	29627290	BND:81424:0:1:1:1	G	[chr2:114173319[G	.	MaxDepth	.
> chr20	35365307	BND:54200:0:1:0:1	T	]chr1:230941520]T	.	PASS	.
> chr20	60520225	DEL:572151:1:1:6:4:0	AACGATGAGGAGCATCGCGGCTGTCTGCACCATGGGAGCCCCTTCTCACTGACAATGAGGAGCATTCAGAGTGTCTACACCGTGGCCACGCCTTCTCACCGATGCTGAGGAGCACCGAGACTGTCTGCACTGTGGCCGCCCCTTCTCACCG	A	.
> 	PASS	.
> chr20	60520443	DEL:572151:1:1:6:4:1	GACTGTCTGCACCGTGGCCGCCCCTTCTCACTGACGATGAGGAGCACTGCGACTGTCTGCACCGTGGCCGCCCTTTCTGACTGATGATAAGGAACATTGCGACTGTCTGCACCGTGGCTGCCCCTTCTCACCAACGCTGAGGAGCACTGCAACCATCTGCA
> CCGTGGCCGCCCCTTCTCACCGATGATGAGGAACATTGAGACTGTCTGCCCCGTGGCTGCCCCTTCTCACCGATGCTGAGGAGCACTGTGACTGTCTGCACCATGGGAGCCCCTTCTCACTGACAATGAGGAGCATTCAGAGTGTCTACACCGTGGCCGCGCCTTCTCACCGATGCTGAGGAGCACCGAGACTGTCTGCACCGTGGCCG
> CCCCTTCTCACCGATGACGAGGAGCACTGCGA	GC	.	PASS	.
> chr20	60520937	DEL:572151:1:1:11:0:0	C	<DEL>	.	PASS	.
> chr20	61766068	DEL:572433:0:0:5:2:0	CAGAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCAGGGCCACAGGGGAGGCAGGGCCCAGAGAGGAGGCGGGGCCACAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCAGGGCCA
> C	CGG	.	PASS	.
> chr20	62063686	DUP:TANDEM:572544:0:0:8:0:0	T	<DUP:TANDEM>	.	PASS	.
>
>
>
>
>
> Thanks
> Jennifer
>
> Jennifer Becq
> Senior Bioinformatics Scientist
> Illumina Cambridge Ltd
> Tel: +44 (0) 1799 532300
> email: jbecq at illumina.com
>
>
>
> -----Original Message-----
> From: Valerie Obenchain [mailto:vobencha at fhcrc.org]
> Sent: 27 November 2013 21:17
> To: Becq, Jennifer; bioc-devel at r-project.org
> Subject: Re: VariantAnnotation writeVcf problem
>
> Hi,
>
> I can't reproduce this error. Here is a read/write example using a file from VariantAnnotation where the results are as expected.
>
> fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation") dest <- tempfile()
> vcf1 <- readVcf(fl, "hg19")
>   > rownames(vcf1)
> [1] "rs6054257"      "20:17330_T/A"   "rs6040355"      "20:1230237_T/."
> [5] "microsat1"
>
> writeVcf(vcf1, dest)
> vcf2 <- readVcf(dest, "hg19")
>   > rownames(vcf2)
> [1] "rs6054257"      "20:17330_T/A"   "rs6040355"      "20:1230237_T/."
> [5] "microsat1"
>
> I need a reproducible example in order to help. Is the vcf you're working with publicly available?
>
> Valerie
>
> On 11/27/2013 03:37 AM, Becq, Jennifer wrote:
>> Hi Valerie,
>>
>> Thank you for cc'ing my message.
>>
>> The "ID" values are removed when reading a VCF through readVcf() and re-writing it with writeVcf():
>>
>> V = readVcf("test.vcf", "hg19")
>> rownames(V)
>> [1] "DEL:561590:0:1:0:0:0"        "BND:81424:0:1:1:1"
>> [3] "BND:54200:0:1:0:1"           "DEL:572151:1:1:6:4:0"
>> [5] "DEL:572151:1:1:6:4:1"        "DEL:572151:1:1:11:0:0"
>> [7] "DEL:572433:0:0:5:2:0"        "DUP:TANDEM:572544:0:0:8:0:0"
>> writeVcf(V, "writeTest.vcf")
>> V2 = readVcf("writeTest.vcf", "hg19")
>> rownames(V2)
>> [1] "chr20:14855644_C/<DEL>"
>> [2] "chr20:29627290_G/[chr2:114173319[G"
>> [3] "chr20:35365307_T/]chr1:230941520]T"
>> [4] "chr20:60520225_AACGATGAGGAGCATCGCGGCTGTCTGCACCATGGGAGCCCCTTCTCACTGACAATGAGGAGCATTCAGAGTGTCTACACCGTGGCCACGCCTTCTCACCGATGCTGAGGAGCACCGAGACTGTCTGCACTGTGGCCGCCCCTTCTCACCG/A"
>> [5] "chr20:60520443_GACTGTCTGCACCGTGGCCGCCCCTTCTCACTGACGATGAGGAGCACTGCGACTGTCTGCACCGTGGCCGCCCTTTCTGACTGATGATAAGGAACATTGCGACTGTCTGCACCGTGGCTGCCCCTTCTCACCAACGCTGAGGAGCACTGCAACCATCTGCACCGTGGCCGCCCCTTCTCACCGATGATGAGGAACATTGAGACTGTCTGCCCCGTGGCTGCCCCTTCTCACCGATGCTGAGGAGCACTGTGACTGTCTGCACCATGGGAGCCCCTTCTCACTGACAATGAGGAGCATTCAGAGTGTCTACACCGTGGCCGCGCCTTCTCACCGATGCTGAGGAGCACCGAGACTGTCTGCACCGTGGCCGCCCCTTCTCACCGATGACGAGGAGCACTGCGA/GC"
>> [6] "chr20:60520937_C/<DEL>"
>> [7] "chr20:61766068_CAGAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCAGGGCCACAGGGGAGGCAGGGCCCAGAGAGGAGGCGGGGCCACAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCAGGGCCAC/CGG"
>> [8] "chr20:62063686_T/<DUP:TANDEM>"
>>
>> sessionInfo()
>> R version 3.0.2 Patched (2013-10-27 r64116)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>    [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>    [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>    [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>    [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>    [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>> [8] base
>>
>> other attached packages:
>> [1] VariantAnnotation_1.8.6 Rsamtools_1.14.2        Biostrings_2.30.1
>> [4] GenomicRanges_1.14.3    XVector_0.2.0           IRanges_1.20.5
>> [7] BiocGenerics_0.8.0
>>
>> loaded via a namespace (and not attached):
>>    [1] AnnotationDbi_1.24.0   Biobase_2.22.0         biomaRt_2.18.0
>>    [4] bitops_1.0-6           BSgenome_1.30.0        DBI_0.2-7
>>    [7] GenomicFeatures_1.14.2 RCurl_1.95-4.1         RSQLite_0.11.4
>> [10] rtracklayer_1.22.0     stats4_3.0.2           tools_3.0.2
>> [13] XML_3.98-1.1           zlibbioc_1.8.0
>>
>>
>> *****  With the following VCF test.vcf:
>>
>> ##fileformat=VCFv4.1
>> #CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
>> chr20	14855644	DEL:561590:0:1:0:0:0	C	<DEL>	.	PASS	.
>> chr20	29627290	BND:81424:0:1:1:1	G	[chr2:114173319[G	.	MaxDepth	.
>> chr20	35365307	BND:54200:0:1:0:1	T	]chr1:230941520]T	.	PASS	.
>> chr20	60520225	DEL:572151:1:1:6:4:0	AACGATGAGGAGCATCGCGGCTGTCTGCACCATGGGAGCCCCTTCTCACTGACAATGAGGAGCATTCAGAGTGTCTACACCGTGGCCACGCCTTCTCACCGATGCTGAGGAGCACCGAGACTGTCTGCACTGTGGCCGCCCCTTCTCACCG	A	
>> .	PASS	.
>> chr20	60520443	DEL:572151:1:1:6:4:1	GACTGTCTGCACCGTGGCCGCCCCTTCTCACTGACGATGAGGAGCACTGCGACTGTCTGCACCGTGGCCGCCCTTTCTGACTGATGATAAGGAACATTGCGACTGTCTGCACCGTGGCTGCCCCTTCTCACCAACGCTGAGGAGCACTGCAACCATCTGC
>> ACCGTGGCCGCCCCTTCTCACCGATGATGAGGAACATTGAGACTGTCTGCCCCGTGGCTGCCCCTTCTCACCGATGCTGAGGAGCACTGTGACTGTCTGCACCATGGGAGCCCCTTCTCACTGACAATGAGGAGCATTCAGAGTGTCTACACCGTGGCCGCGCCTTCTCACCGATGCTGAGGAGCACCGAGACTGTCTGCACCGTGGC
>> CGCCCCTTCTCACCGATGACGAGGAGCACTGCGA	GC	.	PASS	.
>> chr20	60520937	DEL:572151:1:1:11:0:0	C	<DEL>	.	PASS	.
>> chr20	61766068	DEL:572433:0:0:5:2:0	CAGAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCAGGGCCACAGGGGAGGCAGGGCCCAGAGAGGAGGCGGGGCCACAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCAGGGCC
>> AC	CGG	.	PASS	.
>> chr20	62063686	DUP:TANDEM:572544:0:0:8:0:0	T	<DUP:TANDEM>	.	PASS	.
>>
>> Thanks
>> Jennifer
>>
>>
>> Jennifer Becq
>> Bioinformatics Scientist
>> Illumina Cambridge Ltd
>> Tel: +44 (0) 1799 532300
>> email: jbecq at illumina.com
>>
>>
>>
>> -----Original Message-----
>> From: Valerie Obenchain [mailto:vobencha at fhcrc.org]
>> Sent: 20 November 2013 17:28
>> To: Becq, Jennifer; bioc-devel at r-project.org
>> Subject: Re: VariantAnnotation writeVcf problem
>>
>> Hi Jennifer,
>>
>> I've cc'd your message to the Bioconductor mailing list. We have two
>> lists, one for general questions and the other for bug reports/feature
>> requests. Please post future questions to one of these lists instead
>> of sending them to a single person. The lists reach a wider audience
>> and others can chime in with their responses/experience. You can find
>> info about the mailing lists here,
>>
>> http://www.bioconductor.org/help/mailing-list/
>>
>> writeVcf() should only write out '.' for ID if the ID is missing. There is no restriction on the format of the ID. Can you provide a small sample of the vcf file you're having trouble with (just a few lines is enough)? Also include the output of your sessionInfo().
>>
>> Valerie
>>
>>
>> On 11/15/2013 08:56 AM, Becq, Jennifer wrote:
>>> Hi Valerie,
>>>
>>> I've been using VariantAnnotation for quite a while now and it's been great!
>>>
>>> However I've just encountered a problem:
>>>
>>> If I read in a VCF and re-write it directly, the ID column has
>>> disappeared and becomes "." instead of the original
>>> "DEL:9586:0:1:0:0:0", even though the rownames of my VCF object are
>>> correctly populated with the original ID column.
>>>
>>>    > library(VariantAnnotation)
>>>
>>>    > in1 = readVcf("my.vcf.gz", "hg19")
>>>
>>>    > writeVcf(in1, "test.vcf")
>>>
>>> I was wondering if that was because ID only accepts a specific format
>>> (rsID or chr:pos)?
>>>
>>> Thank you for your help
>>>
>>> Jennifer
>>>
>>> *Jennifer Becq*
>>>
>>> *Bioinformatics Scientist*
>>>
>>> *Illumina Cambridge Ltd*
>>>
>>> Tel: +44 (0) 1799 532300
>>>
>>> email: jbecq at illumina.com <mailto:jbecq at illumina.com>
>>>
>>
>



More information about the Bioc-devel mailing list