[BioC] VariantAnnotation - dots in the INFO field give an error
Jarno Tuimala
jtuimala at gmail.com
Tue Nov 13 14:36:46 CET 2012
Dear Vincent,
You're right! The vcf was actually successfully read and created.
So, problem solved, a user error.
Older version of the package seems to give an error, though, and since
I was running these in parallel, I mixed the two sessions. Sorry about
that.
- Jarno
On Mon, Nov 12, 2012 at 1:12 PM, Vincent Carey
<stvjc at channing.harvard.edu> wrote:
> what you reported is a warning, not an error. did the object "vcf" get
> created?
>
> On Mon, Nov 12, 2012 at 4:39 AM, Jarno Tuimala <jtuimala at gmail.com> wrote:
>>
>> Hello!
>>
>> I have a problem reading a VCF file with the VariantAnnotation
>> package. The filtered VCF file (attached as text below) has been
>> generated with vcftools.
>>
>> This is what I tried in R and the resulting error message:
>>
>> > library(VariantAnnotation)
>> > vcf<-readVcf("vcftools.filtered.vcf", "hg19")
>>
>> Warning message:
>> In doTryCatch(return(expr), name, parentenv, handler) :
>> record 1 (and others?) INFO '.' not found
>>
>> If I understood it correctely, the dots in the INFO column of the VCF
>> file create the problem.
>>
>> Is there an alternative way to read this vcf file and annotate it with
>> VariantAnnotation package?
>>
>> Best Regards,
>> Jarno
>>
>>
>> ----
>>
>> This is the session info:
>>
>> R version 2.15.1 Patched (2012-07-25 r59963)
>> Platform: i386-w64-mingw32/i386 (32-bit)
>>
>> locale:
>> [1] LC_COLLATE=Finnish_Finland.1252 LC_CTYPE=Finnish_Finland.1252
>> LC_MONETARY=Finnish_Finland.1252 LC_NUMERIC=C
>> LC_TIME=Finnish_Finland.1252
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] VariantAnnotation_1.4.3 Rsamtools_1.10.1 Biostrings_2.26.2
>> GenomicRanges_1.10.2 IRanges_1.16.3
>> BiocGenerics_0.4.0
>>
>> loaded via a namespace (and not attached):
>> [1] AnnotationDbi_1.20.2 Biobase_2.18.0 biomaRt_2.14.0
>> bitops_1.0-4.1 BSgenome_1.26.1 DBI_0.2-5
>> GenomicFeatures_1.10.0 parallel_2.15.1
>> [9] RCurl_1.95-1.1 RSQLite_0.11.2 rtracklayer_1.18.0
>> stats4_2.15.1 tools_2.15.1 XML_3.95-0.1
>> zlibbioc_1.4.0
>>
>>
>> And this is the VCF file:
>>
>> ##fileformat=VCFv4.1
>> ##samtoolsVersion=0.1.18 (r982:295)
>> ##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
>> ##INFO=<ID=DP4,Number=4,Type=Integer,Description="# high-quality
>> ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
>> ##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square
>> mapping quality of covering reads">
>> ##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability of
>> all samples being the same">
>> ##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood
>> estimate of the first ALT allele frequency (assuming HWE)">
>> ##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood
>> estimate of the first ALT allele count (no HWE assumption)">
>> ##INFO=<ID=G3,Number=3,Type=Float,Description="ML estimate of genotype
>> frequencies">
>> ##INFO=<ID=HWE,Number=1,Type=Float,Description="Chi^2 based HWE test
>> P-value based on G3">
>> ##INFO=<ID=CLR,Number=1,Type=Integer,Description="Log ratio of
>> genotype likelihoods with and without the constraint">
>> ##INFO=<ID=UGT,Number=1,Type=String,Description="The most probable
>> unconstrained genotype configuration in the trio">
>> ##INFO=<ID=CGT,Number=1,Type=String,Description="The most probable
>> constrained genotype configuration in the trio">
>> ##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand
>> bias, baseQ bias, mapQ bias and tail distance bias">
>> ##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the
>> variant is an INDEL.">
>> ##INFO=<ID=PC2,Number=2,Type=Integer,Description="Phred probability of
>> the nonRef allele frequency in group1 samples being larger (,smaller)
>> than in group2.">
>> ##INFO=<ID=PCHI2,Number=1,Type=Float,Description="Posterior weighted
>> chi^2 P-value for testing the association between group1 and group2
>> samples.">
>> ##INFO=<ID=QCHI2,Number=1,Type=Integer,Description="Phred scaled PCHI2.">
>> ##INFO=<ID=PR,Number=1,Type=Integer,Description="# permutations
>> yielding a smaller PCHI2.">
>> ##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias">
>> ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
>> ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
>> ##FORMAT=<ID=GL,Number=3,Type=Float,Description="Likelihoods for
>> RR,RA,AA genotypes (R=ref,A=alt)">
>> ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# high-quality bases">
>> ##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand
>> bias P-value">
>> ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of
>> Phred-scaled genotype likelihoods">
>> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
>> HG00171 HG00174 NA18486 NA18489
>> 20 6731335 . T C 80.5 . . GT:PL:GQ
>> 1/1:0,0,0:3 1/1:0,0,0:3 1/1:0,0,0:3 1/1:113,12,0:13
>> 20 6732603 . A T 25.7 . . GT:PL:GQ
>> 0/0:0,6,54:8 0/0:0,0,0:3 0/0:0,0,0:3 0/1:58,0,27:35
>> 20 6736189 . A G 47.8 . . GT:PL:GQ
>> 0/1:0,0,0:3 0/1:0,0,0:3 0/1:0,0,0:3 1/1:79,6,0:6
>> 20 6736562 . C A 20.4 . . GT:PL:GQ
>> 0/0:0,0,0:4 0/0:0,0,0:4 0/1:53,0,32:40 0/0:0,9,98:11
>> 20 6737384 . A G 62 . . GT:PL:GQ
>> 0/1:0,0,0:3 0/1:0,0,0:3 0/1:0,0,0:3 0/1:92,0,95:92
>> 20 6737551 . G A 26.3 . . GT:PL:GQ
>> 1/1:30,3,0:4 0/1:0,3,40:4 0/1:0,0,0:3 1/1:34,3,0:4
>> 20 6738766 . T A 34.3 . . GT:PL:GQ
>> 0/1:0,0,0:3 0/0:0,3,33:4 0/1:0,0,0:3 1/1:69,6,0:4
>> 20 6739398 . G A 64 . . GT:PL:GQ
>> 1/1:0,0,0:3 1/1:0,0,0:3 1/1:0,0,0:3 1/1:96,9,0:10
>> 20 6740366 . C T 25.8 . . GT:PL:GQ
>> 0/1:0,0,0:3 0/1:0,0,0:3 0/1:0,0,0:3 1/1:57,6,0:6
>> 20 6740850 . G A 34.4 . . GT:PL:GQ
>> 0/1:0,0,0:3 0/0:0,6,59:6 0/1:0,0,0:3 1/1:70,6,0:3
>> 20 6743016 . T C 87.2 . . GT:PL:GQ
>> 0/1:0,0,0:3 0/1:0,3,31:3 0/1:0,0,0:3 1/1:124,12,0:10
>> 20 6743306 . A C 39.8 . . GT:PL:GQ
>> 0/1:0,0,0:3 1/1:71,6,0:6 0/1:0,0,0:3 0/1:0,0,0:3
>> 20 6746498 . C T 17.4 . . GT:PL:GQ
>> 0/1:0,0,0:3 0/0:0,3,38:4 0/1:31,3,0:4 0/1:24,0,54:26
>> 20 6749158 . C A 18.3 . . GT:PL:GQ
>> 0/0:0,3,29:8 0/0:0,3,32:8 0/1:53,0,30:40 0/0:0,21,159:25
>> 20 6749671 . A C 21.3 . . GT:PL:GQ
>> 0/0:0,9,65:7 0/1:33,3,0:3 0/1:28,3,0:3 0/1:0,0,0:3
>> 20 6751034 . A G 999 . . GT:PL:GQ
>> 0/0:0,24,189:19 0/1:33,0,141:38 1/1:255,105,0:99 1/1:255,66,0:65
>> 20 6751316 . A G 155 . . GT:PL:GQ
>> 0/0:0,3,22:4 0/0:0,6,43:6 1/1:116,12,0:8 0/1:84,0,25:29
>> 20 6754246 . G A 16.4 . . GT:PL:GQ
>> 0/0:0,0,0:3 0/0:0,3,20:6 0/0:0,0,0:3 0/1:48,0,43:45
>> 20 6755598 . T G 46 . . GT:PL:GQ
>> 1/1:0,0,0:3 1/1:0,0,0:3 1/1:0,0,0:3 1/1:78,9,0:10
>> 20 6756217 . G A 14.2 . . GT:PL:GQ
>> 0/0:0,3,38:7 0/0:0,3,38:7 0/0:0,0,0:4 0/1:47,0,26:34
>> 20 6760431 . C A 36.8 . . GT:PL:GQ
>> 0/1:0,0,0:3 0/1:0,0,0:3 0/1:0,0,0:3 1/1:68,6,0:6
>> 20 6761512 . C T 104 . . GT:PL:GQ
>> 1/1:0,0,0:3 1/1:0,0,0:3 1/1:0,0,0:3 1/1:136,12,0:13
>> 20 6762025 . G A 29.3 . . GT:PL:GQ
>> 0/1:0,3,37:4 1/1:32,3,0:4 0/1:0,0,0:3 1/1:35,3,0:4
>> 20 6765841 . A C 35.3 . . GT:PL:GQ
>> 0/0:0,3,31:4 0/1:0,0,0:3 0/1:0,0,0:3 1/1:70,6,0:4
>> 20 6767119 . G C 104 . . GT:PL:GQ
>> 1/1:0,0,0:3 1/1:0,0,0:3 1/1:0,0,0:3 1/1:136,12,0:13
>> 20 6767354 . C T 24 . . GT:PL:GQ
>> 0/1:0,0,0:3 0/1:0,0,0:3 0/1:0,0,0:3 0/1:54,0,111:55
>> 20 6767543 . T C 14.2 . . GT:PL:GQ
>> 0/0:0,3,31:7 0/0:0,3,32:7 0/0:0,0,0:4 0/1:47,0,22:30
>> 20 6769102 . T TC 117 . . GT:PL:GQ
>> 1/1:0,0,0:6 1/1:40,3,0:9 1/1:40,3,0:9 1/1:80,6,0:11
>> 20 6769533 . G A 21.4 . . GT:PL:GQ
>> 0/1:0,0,0:3 0/0:0,6,64:6 0/1:0,0,0:3 1/1:57,6,0:3
>> 20 6769676 . A G 27.2 . . GT:PL:GQ
>> 0/0:0,3,32:5 0/0:0,3,34:5 0/0:0,0,0:3 0/1:64,6,0:3
>> 20 6769714 . T C 63.2 . . GT:PL:GQ
>> 1/1:68,6,0:9 1/1:0,0,0:4 1/1:0,0,0:4 1/1:29,3,0:7
>> 20 6769877 . T C 14.5 . . GT:PL:GQ
>> 0/1:27,0,27:27 0/1:0,0,0:3 0/0:0,6,68:6 0/1:26,3,0:4
>> 20 6769893 . C A 16.7 . . GT:PL:GQ
>> 0/0:0,3,38:5 0/0:0,0,0:3 0/0:0,6,63:8 0/1:54,6,0:4
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
More information about the Bioconductor
mailing list