[BioC] VariantAnnotation - dots in the INFO field give an error
Jarno Tuimala
jtuimala at gmail.com
Mon Nov 12 10:39:03 CET 2012
Hello!
I have a problem reading a VCF file with the VariantAnnotation
package. The filtered VCF file (attached as text below) has been
generated with vcftools.
This is what I tried in R and the resulting error message:
> library(VariantAnnotation)
> vcf<-readVcf("vcftools.filtered.vcf", "hg19")
Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
record 1 (and others?) INFO '.' not found
If I understood it correctely, the dots in the INFO column of the VCF
file create the problem.
Is there an alternative way to read this vcf file and annotate it with
VariantAnnotation package?
Best Regards,
Jarno
----
This is the session info:
R version 2.15.1 Patched (2012-07-25 r59963)
Platform: i386-w64-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=Finnish_Finland.1252 LC_CTYPE=Finnish_Finland.1252
LC_MONETARY=Finnish_Finland.1252 LC_NUMERIC=C
LC_TIME=Finnish_Finland.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] VariantAnnotation_1.4.3 Rsamtools_1.10.1 Biostrings_2.26.2
GenomicRanges_1.10.2 IRanges_1.16.3
BiocGenerics_0.4.0
loaded via a namespace (and not attached):
[1] AnnotationDbi_1.20.2 Biobase_2.18.0 biomaRt_2.14.0
bitops_1.0-4.1 BSgenome_1.26.1 DBI_0.2-5
GenomicFeatures_1.10.0 parallel_2.15.1
[9] RCurl_1.95-1.1 RSQLite_0.11.2 rtracklayer_1.18.0
stats4_2.15.1 tools_2.15.1 XML_3.95-0.1
zlibbioc_1.4.0
And this is the VCF file:
##fileformat=VCFv4.1
##samtoolsVersion=0.1.18 (r982:295)
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="# high-quality
ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square
mapping quality of covering reads">
##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability of
all samples being the same">
##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood
estimate of the first ALT allele frequency (assuming HWE)">
##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood
estimate of the first ALT allele count (no HWE assumption)">
##INFO=<ID=G3,Number=3,Type=Float,Description="ML estimate of genotype
frequencies">
##INFO=<ID=HWE,Number=1,Type=Float,Description="Chi^2 based HWE test
P-value based on G3">
##INFO=<ID=CLR,Number=1,Type=Integer,Description="Log ratio of
genotype likelihoods with and without the constraint">
##INFO=<ID=UGT,Number=1,Type=String,Description="The most probable
unconstrained genotype configuration in the trio">
##INFO=<ID=CGT,Number=1,Type=String,Description="The most probable
constrained genotype configuration in the trio">
##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand
bias, baseQ bias, mapQ bias and tail distance bias">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the
variant is an INDEL.">
##INFO=<ID=PC2,Number=2,Type=Integer,Description="Phred probability of
the nonRef allele frequency in group1 samples being larger (,smaller)
than in group2.">
##INFO=<ID=PCHI2,Number=1,Type=Float,Description="Posterior weighted
chi^2 P-value for testing the association between group1 and group2
samples.">
##INFO=<ID=QCHI2,Number=1,Type=Integer,Description="Phred scaled PCHI2.">
##INFO=<ID=PR,Number=1,Type=Integer,Description="# permutations
yielding a smaller PCHI2.">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GL,Number=3,Type=Float,Description="Likelihoods for
RR,RA,AA genotypes (R=ref,A=alt)">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# high-quality bases">
##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand
bias P-value">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of
Phred-scaled genotype likelihoods">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG00171 HG00174 NA18486 NA18489
20 6731335 . T C 80.5 . . GT:PL:GQ 1/1:0,0,0:3 1/1:0,0,0:3 1/1:0,0,0:3 1/1:113,12,0:13
20 6732603 . A T 25.7 . . GT:PL:GQ 0/0:0,6,54:8 0/0:0,0,0:3 0/0:0,0,0:3 0/1:58,0,27:35
20 6736189 . A G 47.8 . . GT:PL:GQ 0/1:0,0,0:3 0/1:0,0,0:3 0/1:0,0,0:3 1/1:79,6,0:6
20 6736562 . C A 20.4 . . GT:PL:GQ 0/0:0,0,0:4 0/0:0,0,0:4 0/1:53,0,32:40 0/0:0,9,98:11
20 6737384 . A G 62 . . GT:PL:GQ 0/1:0,0,0:3 0/1:0,0,0:3 0/1:0,0,0:3 0/1:92,0,95:92
20 6737551 . G A 26.3 . . GT:PL:GQ 1/1:30,3,0:4 0/1:0,3,40:4 0/1:0,0,0:3 1/1:34,3,0:4
20 6738766 . T A 34.3 . . GT:PL:GQ 0/1:0,0,0:3 0/0:0,3,33:4 0/1:0,0,0:3 1/1:69,6,0:4
20 6739398 . G A 64 . . GT:PL:GQ 1/1:0,0,0:3 1/1:0,0,0:3 1/1:0,0,0:3 1/1:96,9,0:10
20 6740366 . C T 25.8 . . GT:PL:GQ 0/1:0,0,0:3 0/1:0,0,0:3 0/1:0,0,0:3 1/1:57,6,0:6
20 6740850 . G A 34.4 . . GT:PL:GQ 0/1:0,0,0:3 0/0:0,6,59:6 0/1:0,0,0:3 1/1:70,6,0:3
20 6743016 . T C 87.2 . . GT:PL:GQ 0/1:0,0,0:3 0/1:0,3,31:3 0/1:0,0,0:3 1/1:124,12,0:10
20 6743306 . A C 39.8 . . GT:PL:GQ 0/1:0,0,0:3 1/1:71,6,0:6 0/1:0,0,0:3 0/1:0,0,0:3
20 6746498 . C T 17.4 . . GT:PL:GQ 0/1:0,0,0:3 0/0:0,3,38:4 0/1:31,3,0:4 0/1:24,0,54:26
20 6749158 . C A 18.3 . . GT:PL:GQ 0/0:0,3,29:8 0/0:0,3,32:8 0/1:53,0,30:40 0/0:0,21,159:25
20 6749671 . A C 21.3 . . GT:PL:GQ 0/0:0,9,65:7 0/1:33,3,0:3 0/1:28,3,0:3 0/1:0,0,0:3
20 6751034 . A G 999 . . GT:PL:GQ 0/0:0,24,189:19 0/1:33,0,141:38 1/1:255,105,0:99 1/1:255,66,0:65
20 6751316 . A G 155 . . GT:PL:GQ 0/0:0,3,22:4 0/0:0,6,43:6 1/1:116,12,0:8 0/1:84,0,25:29
20 6754246 . G A 16.4 . . GT:PL:GQ 0/0:0,0,0:3 0/0:0,3,20:6 0/0:0,0,0:3 0/1:48,0,43:45
20 6755598 . T G 46 . . GT:PL:GQ 1/1:0,0,0:3 1/1:0,0,0:3 1/1:0,0,0:3 1/1:78,9,0:10
20 6756217 . G A 14.2 . . GT:PL:GQ 0/0:0,3,38:7 0/0:0,3,38:7 0/0:0,0,0:4 0/1:47,0,26:34
20 6760431 . C A 36.8 . . GT:PL:GQ 0/1:0,0,0:3 0/1:0,0,0:3 0/1:0,0,0:3 1/1:68,6,0:6
20 6761512 . C T 104 . . GT:PL:GQ 1/1:0,0,0:3 1/1:0,0,0:3 1/1:0,0,0:3 1/1:136,12,0:13
20 6762025 . G A 29.3 . . GT:PL:GQ 0/1:0,3,37:4 1/1:32,3,0:4 0/1:0,0,0:3 1/1:35,3,0:4
20 6765841 . A C 35.3 . . GT:PL:GQ 0/0:0,3,31:4 0/1:0,0,0:3 0/1:0,0,0:3 1/1:70,6,0:4
20 6767119 . G C 104 . . GT:PL:GQ 1/1:0,0,0:3 1/1:0,0,0:3 1/1:0,0,0:3 1/1:136,12,0:13
20 6767354 . C T 24 . . GT:PL:GQ 0/1:0,0,0:3 0/1:0,0,0:3 0/1:0,0,0:3 0/1:54,0,111:55
20 6767543 . T C 14.2 . . GT:PL:GQ 0/0:0,3,31:7 0/0:0,3,32:7 0/0:0,0,0:4 0/1:47,0,22:30
20 6769102 . T TC 117 . . GT:PL:GQ 1/1:0,0,0:6 1/1:40,3,0:9 1/1:40,3,0:9 1/1:80,6,0:11
20 6769533 . G A 21.4 . . GT:PL:GQ 0/1:0,0,0:3 0/0:0,6,64:6 0/1:0,0,0:3 1/1:57,6,0:3
20 6769676 . A G 27.2 . . GT:PL:GQ 0/0:0,3,32:5 0/0:0,3,34:5 0/0:0,0,0:3 0/1:64,6,0:3
20 6769714 . T C 63.2 . . GT:PL:GQ 1/1:68,6,0:9 1/1:0,0,0:4 1/1:0,0,0:4 1/1:29,3,0:7
20 6769877 . T C 14.5 . . GT:PL:GQ 0/1:27,0,27:27 0/1:0,0,0:3 0/0:0,6,68:6 0/1:26,3,0:4
20 6769893 . C A 16.7 . . GT:PL:GQ 0/0:0,3,38:5 0/0:0,0,0:3 0/0:0,6,63:8 0/1:54,6,0:4
More information about the Bioconductor
mailing list