[Bioc-devel] VariantAnnotation: Convertion from VCF to VRanges for CG example VCF

Valerie Obenchain vobencha at fhcrc.org
Thu Dec 12 20:57:28 CET 2013


Hi Julian,

VariantAnnotation 1.9.23 has an updated chr7-sub.vcf.gz file and some 
checks to handle files with 'malformed' AD fields.

My understanding is that 'AD' should have a value for REF and a value 
for each ALT. If we have 2 ALTs then we should have 3 AD values, 3 ALTs 
would result in 4 AD values etc.

http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_annotator_DepthPerAlleleBySample.html

Most vcf files I've seen specify AD with 'Number=.' in the header. This 
makes sense because the variants often have a different number of ALTs.
The chr7-sub.vcf.gz file had 'Number=2'. When 'Number' is an integer 
scanVcf() parses the data into an appropriately dimensioned matrix or 
array. 'Number=2' for AD would be valid if all variants in the file had 
only 1 ALT (i.e., resulting in 2 AD values) but that isn't the case. 
That file had variants with 2 and 3 ALT values but only reported 2 AD 
values for each variant. This became a problem when you tried to expand 
the VCF.

We've tried to make expand() 'smart' by expanding or repping out 
variables according to what we know. AD is one of the variables we look 
for when expanding and try to create the REF/ALT pairs. I've added some 
checks to handle AD fields that don't have a value for REF and each ALT.

Thanks for reporting this.

Valerie


On 12/09/2013 12:20 PM, Valerie Obenchain wrote:
> Hi Julian,
>
> I'm looking into this. It's likely that the chr7 file has an invalid
> header for the AD variable (which should be appropriately handled by
> readVcf()). I'll let you know when it's resolved.
>
> Valerie
>
>
>
>
> On 12/07/2013 06:51 AM, Julian Gehring wrote:
>> Hi,
>>
>> I tried to import example VCFs from 'VariantAnnotation' and convert it
>> to a 'VRanges' object.  While this works fine for the 'chr22.vcf.gz', it
>> fails for the 'chr7-sub.vcf.gz' VCF:
>>
>> #+BEGIN_SRC R
>>
>> library(VariantAnnotation)
>> f = system.file("extdata", "chr7-sub.vcf.gz", package =
>> "VariantAnnotation", mustWork = TRUE)
>> vcf = readVcf(f, "hg19")
>> vr = as(vcf, "VRanges")
>>
>> #+END_SRC
>>
>> results in:
>>
>> #+BEGIN_EXAMPLE
>>
>> Error in validObject(.Object) :
>>    invalid class “SummarizedExperiment” object: 'rowData' length differs
>> from 'assays' nrow
>>
>> #+END_EXAMPLE
>>
>> The traceback returns:
>>
>> #+BEGIN_EXAMPLE
>> 16: stop(msg, ": ", errors, domain = NA)
>> 15: validObject(.Object)
>> 14: initialize(value, ...)
>> 13: initialize(value, ...)
>> 12: new("SummarizedExperiment", exptData = exptData, rowData = rowData,
>>          colData = colData, assays = assays, ...)
>> 11: .local(assays, ...)
>> 10: SummarizedExperiment(assays = geno, rowData = rowData, colData =
>> colData,
>>          exptData = exptData)
>> 9: SummarizedExperiment(assays = geno, rowData = rowData, colData =
>> colData,
>>         exptData = exptData)
>> 8: initialize(value, ...)
>> 7: initialize(value, ...)
>> 6: new(class, SummarizedExperiment(assays = geno, rowData = rowData,
>>         colData = colData, exptData = exptData), fixed = fixed, info =
>> info,
>>         ...)
>> 5: VCF(rowData = rdexp, colData = colData(x), exptData = exptData(x),
>>         fixed = fexp, info = iexp, geno = gexp, ..., collapsed = FALSE)
>> 4: expand(from)
>> 3: expand(from)
>> 2: asMethod(object)
>> 1: as(vcf, "VRanges")
>> #+END_EXAMPLE
>>
>> This occurs both with bioc-release and bioc-devel (all packages up to
>> date 2013-12-07).
>>
>> Best wishes
>> Julian
>
>


-- 
Valerie Obenchain

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B155
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: vobencha at fhcrc.org
Phone:  (206) 667-3158
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list