[Bioc-devel] VariantAnnotation: Convertion from VCF to VRanges for CG example VCF

Julian Gehring julian.gehring at embl.de
Thu Dec 12 21:18:06 CET 2013


Hi Valerie,

Thank you for the work you have put into this!

Best wishes
Julian


On 12/12/2013 08:57 PM, Valerie Obenchain wrote:
> Hi Julian,
>
> VariantAnnotation 1.9.23 has an updated chr7-sub.vcf.gz file and some
> checks to handle files with 'malformed' AD fields.
>
> My understanding is that 'AD' should have a value for REF and a value
> for each ALT. If we have 2 ALTs then we should have 3 AD values, 3 ALTs
> would result in 4 AD values etc.
>
> http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_annotator_DepthPerAlleleBySample.html
>
>
> Most vcf files I've seen specify AD with 'Number=.' in the header. This
> makes sense because the variants often have a different number of ALTs.
> The chr7-sub.vcf.gz file had 'Number=2'. When 'Number' is an integer
> scanVcf() parses the data into an appropriately dimensioned matrix or
> array. 'Number=2' for AD would be valid if all variants in the file had
> only 1 ALT (i.e., resulting in 2 AD values) but that isn't the case.
> That file had variants with 2 and 3 ALT values but only reported 2 AD
> values for each variant. This became a problem when you tried to expand
> the VCF.
>
> We've tried to make expand() 'smart' by expanding or repping out
> variables according to what we know. AD is one of the variables we look
> for when expanding and try to create the REF/ALT pairs. I've added some
> checks to handle AD fields that don't have a value for REF and each ALT.
>
> Thanks for reporting this.
>
> Valerie
>
>
> On 12/09/2013 12:20 PM, Valerie Obenchain wrote:
>> Hi Julian,
>>
>> I'm looking into this. It's likely that the chr7 file has an invalid
>> header for the AD variable (which should be appropriately handled by
>> readVcf()). I'll let you know when it's resolved.
>>
>> Valerie
>>
>>
>>
>>
>> On 12/07/2013 06:51 AM, Julian Gehring wrote:
>>> Hi,
>>>
>>> I tried to import example VCFs from 'VariantAnnotation' and convert it
>>> to a 'VRanges' object.  While this works fine for the 'chr22.vcf.gz', it
>>> fails for the 'chr7-sub.vcf.gz' VCF:
>>>
>>> #+BEGIN_SRC R
>>>
>>> library(VariantAnnotation)
>>> f = system.file("extdata", "chr7-sub.vcf.gz", package =
>>> "VariantAnnotation", mustWork = TRUE)
>>> vcf = readVcf(f, "hg19")
>>> vr = as(vcf, "VRanges")
>>>
>>> #+END_SRC
>>>
>>> results in:
>>>
>>> #+BEGIN_EXAMPLE
>>>
>>> Error in validObject(.Object) :
>>>    invalid class “SummarizedExperiment” object: 'rowData' length differs
>>> from 'assays' nrow
>>>
>>> #+END_EXAMPLE
>>>
>>> The traceback returns:
>>>
>>> #+BEGIN_EXAMPLE
>>> 16: stop(msg, ": ", errors, domain = NA)
>>> 15: validObject(.Object)
>>> 14: initialize(value, ...)
>>> 13: initialize(value, ...)
>>> 12: new("SummarizedExperiment", exptData = exptData, rowData = rowData,
>>>          colData = colData, assays = assays, ...)
>>> 11: .local(assays, ...)
>>> 10: SummarizedExperiment(assays = geno, rowData = rowData, colData =
>>> colData,
>>>          exptData = exptData)
>>> 9: SummarizedExperiment(assays = geno, rowData = rowData, colData =
>>> colData,
>>>         exptData = exptData)
>>> 8: initialize(value, ...)
>>> 7: initialize(value, ...)
>>> 6: new(class, SummarizedExperiment(assays = geno, rowData = rowData,
>>>         colData = colData, exptData = exptData), fixed = fixed, info =
>>> info,
>>>         ...)
>>> 5: VCF(rowData = rdexp, colData = colData(x), exptData = exptData(x),
>>>         fixed = fexp, info = iexp, geno = gexp, ..., collapsed = FALSE)
>>> 4: expand(from)
>>> 3: expand(from)
>>> 2: asMethod(object)
>>> 1: as(vcf, "VRanges")
>>> #+END_EXAMPLE
>>>
>>> This occurs both with bioc-release and bioc-devel (all packages up to
>>> date 2013-12-07).
>>>
>>> Best wishes
>>> Julian
>>
>>
>
>



More information about the Bioc-devel mailing list