[Bioc-devel] VariantAnnotation writeVcf problem

Valerie Obenchain vobencha at fhcrc.org
Wed Nov 27 22:17:15 CET 2013


Hi,

I can't reproduce this error. Here is a read/write example using a file 
from VariantAnnotation where the results are as expected.

fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
dest <- tempfile()
vcf1 <- readVcf(fl, "hg19")
 > rownames(vcf1)
[1] "rs6054257"      "20:17330_T/A"   "rs6040355"      "20:1230237_T/."
[5] "microsat1"

writeVcf(vcf1, dest)
vcf2 <- readVcf(dest, "hg19")
 > rownames(vcf2)
[1] "rs6054257"      "20:17330_T/A"   "rs6040355"      "20:1230237_T/."
[5] "microsat1"

I need a reproducible example in order to help. Is the vcf you're 
working with publicly available?

Valerie

On 11/27/2013 03:37 AM, Becq, Jennifer wrote:
> Hi Valerie,
>
> Thank you for cc'ing my message.
>
> The "ID" values are removed when reading a VCF through readVcf() and re-writing it with writeVcf():
>
> V = readVcf("test.vcf", "hg19")
> rownames(V)
> [1] "DEL:561590:0:1:0:0:0"        "BND:81424:0:1:1:1"
> [3] "BND:54200:0:1:0:1"           "DEL:572151:1:1:6:4:0"
> [5] "DEL:572151:1:1:6:4:1"        "DEL:572151:1:1:11:0:0"
> [7] "DEL:572433:0:0:5:2:0"        "DUP:TANDEM:572544:0:0:8:0:0"
> writeVcf(V, "writeTest.vcf")
> V2 = readVcf("writeTest.vcf", "hg19")
> rownames(V2)
> [1] "chr20:14855644_C/<DEL>"
> [2] "chr20:29627290_G/[chr2:114173319[G"
> [3] "chr20:35365307_T/]chr1:230941520]T"
> [4] "chr20:60520225_AACGATGAGGAGCATCGCGGCTGTCTGCACCATGGGAGCCCCTTCTCACTGACAATGAGGAGCATTCAGAGTGTCTACACCGTGGCCACGCCTTCTCACCGATGCTGAGGAGCACCGAGACTGTCTGCACTGTGGCCGCCCCTTCTCACCG/A"
> [5] "chr20:60520443_GACTGTCTGCACCGTGGCCGCCCCTTCTCACTGACGATGAGGAGCACTGCGACTGTCTGCACCGTGGCCGCCCTTTCTGACTGATGATAAGGAACATTGCGACTGTCTGCACCGTGGCTGCCCCTTCTCACCAACGCTGAGGAGCACTGCAACCATCTGCACCGTGGCCGCCCCTTCTCACCGATGATGAGGAACATTGAGACTGTCTGCCCCGTGGCTGCCCCTTCTCACCGATGCTGAGGAGCACTGTGACTGTCTGCACCATGGGAGCCCCTTCTCACTGACAATGAGGAGCATTCAGAGTGTCTACACCGTGGCCGCGCCTTCTCACCGATGCTGAGGAGCACCGAGACTGTCTGCACCGTGGCCGCCCCTTCTCACCGATGACGAGGAGCACTGCGA/GC"
> [6] "chr20:60520937_C/<DEL>"
> [7] "chr20:61766068_CAGAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCAGGGCCACAGGGGAGGCAGGGCCCAGAGAGGAGGCGGGGCCACAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCAGGGCCAC/CGG"
> [8] "chr20:62063686_T/<DUP:TANDEM>"
>
> sessionInfo()
> R version 3.0.2 Patched (2013-10-27 r64116)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
> [1] VariantAnnotation_1.8.6 Rsamtools_1.14.2        Biostrings_2.30.1
> [4] GenomicRanges_1.14.3    XVector_0.2.0           IRanges_1.20.5
> [7] BiocGenerics_0.8.0
>
> loaded via a namespace (and not attached):
>   [1] AnnotationDbi_1.24.0   Biobase_2.22.0         biomaRt_2.18.0
>   [4] bitops_1.0-6           BSgenome_1.30.0        DBI_0.2-7
>   [7] GenomicFeatures_1.14.2 RCurl_1.95-4.1         RSQLite_0.11.4
> [10] rtracklayer_1.22.0     stats4_3.0.2           tools_3.0.2
> [13] XML_3.98-1.1           zlibbioc_1.8.0
>
>
> *****  With the following VCF test.vcf:
>
> ##fileformat=VCFv4.1
> #CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
> chr20	14855644	DEL:561590:0:1:0:0:0	C	<DEL>	.	PASS	.
> chr20	29627290	BND:81424:0:1:1:1	G	[chr2:114173319[G	.	MaxDepth	.
> chr20	35365307	BND:54200:0:1:0:1	T	]chr1:230941520]T	.	PASS	.
> chr20	60520225	DEL:572151:1:1:6:4:0	AACGATGAGGAGCATCGCGGCTGTCTGCACCATGGGAGCCCCTTCTCACTGACAATGAGGAGCATTCAGAGTGTCTACACCGTGGCCACGCCTTCTCACCGATGCTGAGGAGCACCGAGACTGTCTGCACTGTGGCCGCCCCTTCTCACCG	A	
> .	PASS	.
> chr20	60520443	DEL:572151:1:1:6:4:1	GACTGTCTGCACCGTGGCCGCCCCTTCTCACTGACGATGAGGAGCACTGCGACTGTCTGCACCGTGGCCGCCCTTTCTGACTGATGATAAGGAACATTGCGACTGTCTGCACCGTGGCTGCCCCTTCTCACCAACGCTGAGGAGCACTGCAACCATCTGC
> ACCGTGGCCGCCCCTTCTCACCGATGATGAGGAACATTGAGACTGTCTGCCCCGTGGCTGCCCCTTCTCACCGATGCTGAGGAGCACTGTGACTGTCTGCACCATGGGAGCCCCTTCTCACTGACAATGAGGAGCATTCAGAGTGTCTACACCGTGGCCGCGCCTTCTCACCGATGCTGAGGAGCACCGAGACTGTCTGCACCGTGGC
> CGCCCCTTCTCACCGATGACGAGGAGCACTGCGA	GC	.	PASS	.
> chr20	60520937	DEL:572151:1:1:11:0:0	C	<DEL>	.	PASS	.
> chr20	61766068	DEL:572433:0:0:5:2:0	CAGAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCAGGGCCACAGGGGAGGCAGGGCCCAGAGAGGAGGCGGGGCCACAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCAGGGCC
> AC	CGG	.	PASS	.
> chr20	62063686	DUP:TANDEM:572544:0:0:8:0:0	T	<DUP:TANDEM>	.	PASS	.
>
> Thanks
> Jennifer
>
>
> Jennifer Becq
> Bioinformatics Scientist
> Illumina Cambridge Ltd
> Tel: +44 (0) 1799 532300
> email: jbecq at illumina.com
>
>
>
> -----Original Message-----
> From: Valerie Obenchain [mailto:vobencha at fhcrc.org]
> Sent: 20 November 2013 17:28
> To: Becq, Jennifer; bioc-devel at r-project.org
> Subject: Re: VariantAnnotation writeVcf problem
>
> Hi Jennifer,
>
> I've cc'd your message to the Bioconductor mailing list. We have two lists, one for general questions and the other for bug reports/feature requests. Please post future questions to one of these lists instead of sending them to a single person. The lists reach a wider audience and others can chime in with their responses/experience. You can find info about the mailing lists here,
>
> http://www.bioconductor.org/help/mailing-list/
>
> writeVcf() should only write out '.' for ID if the ID is missing. There is no restriction on the format of the ID. Can you provide a small sample of the vcf file you're having trouble with (just a few lines is enough)? Also include the output of your sessionInfo().
>
> Valerie
>
>
> On 11/15/2013 08:56 AM, Becq, Jennifer wrote:
>> Hi Valerie,
>>
>> I've been using VariantAnnotation for quite a while now and it's been great!
>>
>> However I've just encountered a problem:
>>
>> If I read in a VCF and re-write it directly, the ID column has
>> disappeared and becomes "." instead of the original
>> "DEL:9586:0:1:0:0:0", even though the rownames of my VCF object are
>> correctly populated with the original ID column.
>>
>>   > library(VariantAnnotation)
>>
>>   > in1 = readVcf("my.vcf.gz", "hg19")
>>
>>   > writeVcf(in1, "test.vcf")
>>
>> I was wondering if that was because ID only accepts a specific format
>> (rsID or chr:pos)?
>>
>> Thank you for your help
>>
>> Jennifer
>>
>> *Jennifer Becq*
>>
>> *Bioinformatics Scientist*
>>
>> *Illumina Cambridge Ltd*
>>
>> Tel: +44 (0) 1799 532300
>>
>> email: jbecq at illumina.com <mailto:jbecq at illumina.com>
>>
>



More information about the Bioc-devel mailing list