[Bioc-devel] VariantAnnotation writeVcf problem
Valerie Obenchain
vobencha at fhcrc.org
Wed Nov 27 22:17:15 CET 2013
Hi,
I can't reproduce this error. Here is a read/write example using a file
from VariantAnnotation where the results are as expected.
fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
dest <- tempfile()
vcf1 <- readVcf(fl, "hg19")
> rownames(vcf1)
[1] "rs6054257" "20:17330_T/A" "rs6040355" "20:1230237_T/."
[5] "microsat1"
writeVcf(vcf1, dest)
vcf2 <- readVcf(dest, "hg19")
> rownames(vcf2)
[1] "rs6054257" "20:17330_T/A" "rs6040355" "20:1230237_T/."
[5] "microsat1"
I need a reproducible example in order to help. Is the vcf you're
working with publicly available?
Valerie
On 11/27/2013 03:37 AM, Becq, Jennifer wrote:
> Hi Valerie,
>
> Thank you for cc'ing my message.
>
> The "ID" values are removed when reading a VCF through readVcf() and re-writing it with writeVcf():
>
> V = readVcf("test.vcf", "hg19")
> rownames(V)
> [1] "DEL:561590:0:1:0:0:0" "BND:81424:0:1:1:1"
> [3] "BND:54200:0:1:0:1" "DEL:572151:1:1:6:4:0"
> [5] "DEL:572151:1:1:6:4:1" "DEL:572151:1:1:11:0:0"
> [7] "DEL:572433:0:0:5:2:0" "DUP:TANDEM:572544:0:0:8:0:0"
> writeVcf(V, "writeTest.vcf")
> V2 = readVcf("writeTest.vcf", "hg19")
> rownames(V2)
> [1] "chr20:14855644_C/<DEL>"
> [2] "chr20:29627290_G/[chr2:114173319[G"
> [3] "chr20:35365307_T/]chr1:230941520]T"
> [4] "chr20:60520225_AACGATGAGGAGCATCGCGGCTGTCTGCACCATGGGAGCCCCTTCTCACTGACAATGAGGAGCATTCAGAGTGTCTACACCGTGGCCACGCCTTCTCACCGATGCTGAGGAGCACCGAGACTGTCTGCACTGTGGCCGCCCCTTCTCACCG/A"
> [5] "chr20:60520443_GACTGTCTGCACCGTGGCCGCCCCTTCTCACTGACGATGAGGAGCACTGCGACTGTCTGCACCGTGGCCGCCCTTTCTGACTGATGATAAGGAACATTGCGACTGTCTGCACCGTGGCTGCCCCTTCTCACCAACGCTGAGGAGCACTGCAACCATCTGCACCGTGGCCGCCCCTTCTCACCGATGATGAGGAACATTGAGACTGTCTGCCCCGTGGCTGCCCCTTCTCACCGATGCTGAGGAGCACTGTGACTGTCTGCACCATGGGAGCCCCTTCTCACTGACAATGAGGAGCATTCAGAGTGTCTACACCGTGGCCGCGCCTTCTCACCGATGCTGAGGAGCACCGAGACTGTCTGCACCGTGGCCGCCCCTTCTCACCGATGACGAGGAGCACTGCGA/GC"
> [6] "chr20:60520937_C/<DEL>"
> [7] "chr20:61766068_CAGAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCAGGGCCACAGGGGAGGCAGGGCCCAGAGAGGAGGCGGGGCCACAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCAGGGCCAC/CGG"
> [8] "chr20:62063686_T/<DUP:TANDEM>"
>
> sessionInfo()
> R version 3.0.2 Patched (2013-10-27 r64116)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] VariantAnnotation_1.8.6 Rsamtools_1.14.2 Biostrings_2.30.1
> [4] GenomicRanges_1.14.3 XVector_0.2.0 IRanges_1.20.5
> [7] BiocGenerics_0.8.0
>
> loaded via a namespace (and not attached):
> [1] AnnotationDbi_1.24.0 Biobase_2.22.0 biomaRt_2.18.0
> [4] bitops_1.0-6 BSgenome_1.30.0 DBI_0.2-7
> [7] GenomicFeatures_1.14.2 RCurl_1.95-4.1 RSQLite_0.11.4
> [10] rtracklayer_1.22.0 stats4_3.0.2 tools_3.0.2
> [13] XML_3.98-1.1 zlibbioc_1.8.0
>
>
> ***** With the following VCF test.vcf:
>
> ##fileformat=VCFv4.1
> #CHROM POS ID REF ALT QUAL FILTER INFO
> chr20 14855644 DEL:561590:0:1:0:0:0 C <DEL> . PASS .
> chr20 29627290 BND:81424:0:1:1:1 G [chr2:114173319[G . MaxDepth .
> chr20 35365307 BND:54200:0:1:0:1 T ]chr1:230941520]T . PASS .
> chr20 60520225 DEL:572151:1:1:6:4:0 AACGATGAGGAGCATCGCGGCTGTCTGCACCATGGGAGCCCCTTCTCACTGACAATGAGGAGCATTCAGAGTGTCTACACCGTGGCCACGCCTTCTCACCGATGCTGAGGAGCACCGAGACTGTCTGCACTGTGGCCGCCCCTTCTCACCG A
> . PASS .
> chr20 60520443 DEL:572151:1:1:6:4:1 GACTGTCTGCACCGTGGCCGCCCCTTCTCACTGACGATGAGGAGCACTGCGACTGTCTGCACCGTGGCCGCCCTTTCTGACTGATGATAAGGAACATTGCGACTGTCTGCACCGTGGCTGCCCCTTCTCACCAACGCTGAGGAGCACTGCAACCATCTGC
> ACCGTGGCCGCCCCTTCTCACCGATGATGAGGAACATTGAGACTGTCTGCCCCGTGGCTGCCCCTTCTCACCGATGCTGAGGAGCACTGTGACTGTCTGCACCATGGGAGCCCCTTCTCACTGACAATGAGGAGCATTCAGAGTGTCTACACCGTGGCCGCGCCTTCTCACCGATGCTGAGGAGCACCGAGACTGTCTGCACCGTGGC
> CGCCCCTTCTCACCGATGACGAGGAGCACTGCGA GC . PASS .
> chr20 60520937 DEL:572151:1:1:11:0:0 C <DEL> . PASS .
> chr20 61766068 DEL:572433:0:0:5:2:0 CAGAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCAGGGCCACAGGGGAGGCAGGGCCCAGAGAGGAGGCGGGGCCACAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCAGGGCC
> AC CGG . PASS .
> chr20 62063686 DUP:TANDEM:572544:0:0:8:0:0 T <DUP:TANDEM> . PASS .
>
> Thanks
> Jennifer
>
>
> Jennifer Becq
> Bioinformatics Scientist
> Illumina Cambridge Ltd
> Tel: +44 (0) 1799 532300
> email: jbecq at illumina.com
>
>
>
> -----Original Message-----
> From: Valerie Obenchain [mailto:vobencha at fhcrc.org]
> Sent: 20 November 2013 17:28
> To: Becq, Jennifer; bioc-devel at r-project.org
> Subject: Re: VariantAnnotation writeVcf problem
>
> Hi Jennifer,
>
> I've cc'd your message to the Bioconductor mailing list. We have two lists, one for general questions and the other for bug reports/feature requests. Please post future questions to one of these lists instead of sending them to a single person. The lists reach a wider audience and others can chime in with their responses/experience. You can find info about the mailing lists here,
>
> http://www.bioconductor.org/help/mailing-list/
>
> writeVcf() should only write out '.' for ID if the ID is missing. There is no restriction on the format of the ID. Can you provide a small sample of the vcf file you're having trouble with (just a few lines is enough)? Also include the output of your sessionInfo().
>
> Valerie
>
>
> On 11/15/2013 08:56 AM, Becq, Jennifer wrote:
>> Hi Valerie,
>>
>> I've been using VariantAnnotation for quite a while now and it's been great!
>>
>> However I've just encountered a problem:
>>
>> If I read in a VCF and re-write it directly, the ID column has
>> disappeared and becomes "." instead of the original
>> "DEL:9586:0:1:0:0:0", even though the rownames of my VCF object are
>> correctly populated with the original ID column.
>>
>> > library(VariantAnnotation)
>>
>> > in1 = readVcf("my.vcf.gz", "hg19")
>>
>> > writeVcf(in1, "test.vcf")
>>
>> I was wondering if that was because ID only accepts a specific format
>> (rsID or chr:pos)?
>>
>> Thank you for your help
>>
>> Jennifer
>>
>> *Jennifer Becq*
>>
>> *Bioinformatics Scientist*
>>
>> *Illumina Cambridge Ltd*
>>
>> Tel: +44 (0) 1799 532300
>>
>> email: jbecq at illumina.com <mailto:jbecq at illumina.com>
>>
>
More information about the Bioc-devel
mailing list