[BioC] scanVcf: FORMAT 'GT' not found
seth redmond
seth.redmond at pasteur.fr
Mon Dec 3 19:18:03 CET 2012
Urgh, yeah I'd checked the tabs between the columns a hundred times, but I hadn't checked for trailing tabs in the header.
thanks for the nudge…
-s
On 3 Dec 2012, at 18:20, Valerie Obenchain wrote:
> Hi Seth,
>
> What version of VariantAnnotation are you using? Please provide the output of sessionInfo().
>
> I think there is a spacing problem in the file - are there true tabs between each field? Test using just the first line of the file so you can easily see/modify the tabs.
>
> I can't reproduce your error with the file output below. I may be modifying the format as I cut and paste. If looking at the spacing does not solve the problem please attach a small subset of the file - maybe just through the first 5 rows.
>
>
> Valerie
>
> On 12/03/2012 03:16 AM, seth redmond wrote:
>> I keep running into an error in my VCF files but can't seem to pinpoint where the problem is. The file has a number of missing genotypes but nothing that should be causing any problems, I don't think, and it passes vcf-validator without any problem.
>> Completely unremarkable code and head of the file below:
>>
>> Has anyone encountered this before? Or has any suggestions as to what might be the issue?
>>
>> thanks
>>
>> -s
>>
>>> filename<-"tmpvcf.vcf.gz"
>>> vcftab<- TabixFile(filename, index = paste(filename, "tbi", sep="."));
>>> vcfScan<- scanVcf(filename)
>> trace: scanVcf(filename)
>> trace: scanVcf(con)
>> Error: scanVcf: record 1 field 1 FORMAT 'GT' not found
>> path: tmpvcf.vcf.gz
>>
>> bash-3.2$ vcf-validator tmpvcf.vcf.gz
>> The header tag 'reference' not present. (Not required but highly recommended.)
>> The header tag 'contig' not present for CHROM=2R. (Not required but highly recommended.)
>> The header tag 'contig' not present for CHROM=3L. (Not required but highly recommended.)
>>
>> ##fileformat=VCFv4.1
>> ##samtoolsVersion=0.1.18 (r982:295)
>> ##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
>> ##INFO=<ID=DP4,Number=4,Type=Integer,Description="# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
>> ##FORMAT=<ID=DP4,Number=4,Type=Integer,Description="# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
>> ##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square mapping quality of covering reads">
>> ##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability of all samples being the same">
>> ##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele frequency (assuming HWE)">
>> ##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele count (no HWE assumption)">
>> ##INFO=<ID=G3,Number=3,Type=Float,Description="ML estimate of genotype frequencies">
>> ##INFO=<ID=HWE,Number=1,Type=Float,Description="Chi^2 based HWE test P-value based on G3">
>> ##INFO=<ID=CLR,Number=1,Type=Integer,Description="Log ratio of genotype likelihoods with and without the constraint">
>> ##INFO=<ID=UGT,Number=1,Type=String,Description="The most probable unconstrained genotype configuration in the trio">
>> ##INFO=<ID=CGT,Number=1,Type=String,Description="The most probable constrained genotype configuration in the trio">
>> ##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand bias, baseQ bias, mapQ bias and tail distance bias">
>> ##INFO=<ID=PC2,Number=2,Type=Integer,Description="Phred probability of the nonRef allele frequency in group1 samples being larger (,smaller) than in group2.">
>> ##INFO=<ID=PCHI2,Number=1,Type=Float,Description="Posterior weighted chi^2 P-value for testing the association between group1 and group2 samples.">
>> ##INFO=<ID=QCHI2,Number=1,Type=Integer,Description="Phred scaled PCHI2.">
>> ##INFO=<ID=PR,Number=1,Type=Integer,Description="# permutations yielding a smaller PCHI2.">
>> ##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias">
>> ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
>> ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
>> ##FORMAT=<ID=GL,Number=3,Type=Float,Description="Likelihoods for RR,RA,AA genotypes (R=ref,A=alt)">
>> ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# high-quality bases">
>> ##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand bias P-value">
>> ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
>> ##source_20121102.1=./vcf-merge -s Fd03_high.vcf.gz Fd03_low.vcf.gz Fd03_zero.vcf.gz
>> ##sourceFiles_20121102.1=0:Fd03_high.vcf.gz,1:Fd03_low.vcf.gz,2:Fd03_zero.vcf.gz
>> ##INFO=<ID=SF,Number=.,Type=String,Description="Source File (index to sourceFiles, f when filtered)">
>> ##INFO=<ID=AC,Number=.,Type=Integer,Description="Allele count in genotypes">
>> ##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
>> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Fd03_high.vcf Fd03_low.vcf Fd03_zero.vcf
>> 2R 23990061 . G A 152.33 . AC1=1;AC=3;AF1=0.5;AN=6;DP4=3,0,2,4;DP=9;FQ=18.1;MQ=35;PV4=0.17,1,1,1;SF=0,1,2;VDB=0.0474 GT:DP4:GQ:DP:PL 0/1:3,0,2,4:48:9:121,0,45 0/1:1,3,6,5:90:15:212,0,87 0/1:2,3,7,5:99:17:214,0,103
>> 2R 23990067 . G A 32.80 . AC1=1;AC=2;AF1=0.5;AN=4;DP4=4,1,2,3;DP=10;FQ=64.8;MQ=35;PV4=0.52,0.022,1,1;SF=0,1,2;VDB=0.0297 GT:DP4:GQ:DP:PL 0/1:4,1,2,3:95:10:92,0,106 .:6,8,2,1:.:17:20,.,.
>> 0/1:8,8,1,4:59:21:56,0,255
>> 2R 23990070 . T C 109.67 . AC1=1;AC=3;AF1=0.5;AN=6;DP4=3,0,3,4;DP=11;FQ=10.4;MQ=35;PV4=0.2,0.091,1,1;SF=0,1,2;VDB=0.0474 GT:DP4:GQ:DP:PL 0/1:3,0,3,4:40:10:104,0,37 0/1:2,3,6,6:99:17:152,0,103 0/1:2,4,7,9:95:22:163,0,92
>> 2R 23990073 . T C 100.33 . AC1=1;AC=3;AF1=0.5;AN=6;DP4=3,0,3,4;DP=12;FQ=16.1;MQ=35;PV4=0.2,0.025,1,1;SF=0,1,2;VDB=0.0504 GT:DP4:GQ:DP:PL 0/1:3,0,3,4:46:10:101,0,43 0/1:2,3,6,5:99:16:134,0,103 0/1:2,4,7,9:99:22:156,0,113
>> 2R 23990083 . T G 99.92 . AC1=1;AC=2;AF1=0.4995;AN=4;DP4=3,3,3,0;DP=10;FQ=3.02;MQ=38;PV4=0.46,5.9e-05,0.23,1;SF=0,1,2;VDB=0.0426 GT:GQ:DP4:DP:PL .:.:3,3,3,0:9:27,.,. 0/1:38:2,1,6,8:17:165,0,35 0/1:81:1,4,8,10:23:190,0,78
>> 2R 23990100 . A C 114.67 . AC1=1;AC=3;AF1=0.5;AN=6;DP4=4,2,3,1;DP=10;FQ=68;MQ=39;PV4=1,0.41,0.38,0.041;SF=0,1,2;VDB=0.0386 GT:DP4:GQ:DP:PL 0/1:4,2,3,1:98:10:95,0,141 0/1:4,5,3,6:99:18:167,0,172 0/1:4,6,3,6:99:19:172,0,185
>> 2R 23990108 . T A 21.40 . AC1=1;AC=1;AF1=0.5;AN=2;DP4=5,2,3,2;DP=12;FQ=24;MQ=39;PV4=1,3.8e-05,1,1;SF=0,1,2;VDB=0.0075 GT:DP4:GQ:DP:PL 0/1:5,2,3,2:54:12:51,0,146 .:8,6,0,3:.:17:16,.,.
>> .:5,10,1,2:.:18:1,.,.
>> 2R 23990114 . C T 113.00 . AC1=1;AC=3;AF1=0.5;AN=6;DP4=6,3,4,1;DP=14;FQ=81;MQ=40;PV4=1,1,0.24,1;SF=0,1,2;VDB=0.0523 GT:DP4:GQ:DP:PL 0/1:6,3,4,1:99:14:108,0,181 0/1:4,4,3,5:99:16:166,0,147 0/1:3,4,2,7:99:16:155,0,158
>> 2R 23990116 . A T 20.25 . AC1=1;AC=1;AF1=0.4871;AN=2;DP4=8,3,2,1;DP=14;FQ=-14.2;MQ=40;PV4=1,6e-05,0.093,0.25;SF=0,1,2;VDB=0.0282 GT:GQ:DP4:DP:PL .:.:8,3,2,1:14:13,.,. 0/1:40:4,9,4,1:18:38,0,204 .:.:5,10,1,1:17:0,.,.
>> 2R 23990120 . G C 189.67 . AC1=1;AC=3;AF1=0.5;AN=6;DP4=4,2,6,3;DP=15;FQ=103;MQ=40;PV4=1,1,0.026,1;SF=0,1,2;VDB=0.0532 GT:DP4:GQ:DP:PL 0/1:4,2,6,3:99:15:188,0,130 0/1:0,3,8,7:19:18:252,0,16 0/1:2,5,4,8:99:19:219,0,134
>> 2R 23990143 . A C 190.67 . AC1=2;AC=6;AF1=1;AN=6;DP4=0,0,6,4;DP=11;FQ=-57;MQ=43;SF=0,1,2;VDB=0.0436 GT:DP4:GQ:DP:PL 1/1:0,0,6,4:57:10:248,30,0 1/1:0,0,3,6:51:9:212,27,0 1/1:0,0,2,7:51:9:211,27,0
>> 2R 23990147 . A T 15.36 . AC1=1;AC=1;AF1=0.5;AN=2;DP4=5,6,2,1;DP=15;FQ=27;MQ=39;PV4=1,0.25,1,1;SF=0,1,2;VDB=0.0352 GT:DP4:GQ:DP:PL 0/1:5,6,2,1:57:14:54,0,230 .:7,5,0,2:.:14:15,.,.
>> .:7,6,0,2:.:15:24,.,.
>> 2R 23990163 . G A 38.03 . AC1=1;AC=3;AF1=0.5;AN=6;DP4=2,2,2,3;DP=14;FQ=44;MQ=43;PV4=1,4e-05,0.44,0.19;SF=0,1,2;VDB=0.0532 GT:DP4:GQ:DP:PL 0/1:2,2,2,3:74:9:71,0,106 0/1:0,1,4,1:20:6:66,0,17 0/1:0,2,4,1:51:7:67,0,48
>> 2R 23990164 . T C 24.03 . AC1=1;AC=3;AF1=0.5;AN=6;DP4=4,5,2,3;DP=14;FQ=22;MQ=41;PV4=1,0.00033,1,0.056;SF=0,1,2;VDB=0.0532 GT:DP4:GQ:DP:PL 0/1:4,5,2,3:52:14:49,0,164 0/1:3,2,4,1:56:10:53,0,77 0/1:1,4,4,1:63:10:60,0,96
>> 2R 23990171 . T C 74.67 . AC1=1;AC=3;AF1=0.5;AN=6;DP4=4,5,3,4;DP=16;FQ=71;MQ=41;PV4=1,6.1e-07,0.1,1;SF=0,1,2;VDB=0.0532 GT:DP4:GQ:DP:PL 0/1:4,5,3,4:99:16:98,0,194 0/1:4,2,6,1:99:13:100,0,131 0/1:5,3,3,4:99:15:116,0,173
>> 2R 23990190 . C A 27.34 . AC1=1;AC=1;AF1=0.4997;AN=2;DP4=4,6,2,2;DP=14;FQ=4.77;MQ=43;PV4=1,2.3e-09,1,0.15;SF=0,1,2;VDB=0.0352 GT:DP4:GQ:DP:PL 0/1:4,6,2,2:28:14:30,0,225 .:8,1,0,1:.:10:0,.,. .:12,5,2,0:.:19:0,.,.
>> 2R 23990198 . G T 26.67 . AC1=0;AC=1;AF1=0;AN=2;DP4=6,7,2,0;DP=15;FQ=-28;MQ=44;PV4=0.47,0.0016,1,0.052;SF=0,1,2;VDB=0.0260 GT:GQ:DP4:DP:PL .:.:6,7,2,0:15:0,.,. .:.:6,1,1,0:8:3,.,.
>> 0/1:55:10,2,5,1:18:52,0,200
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list