[Bioc-devel] Fwd: [BioC] Problem reading VCF file using readVcf from package VariantAnnotation
vobencha at fhcrc.org
Thu Apr 25 18:32:58 CEST 2013
Hi Vince, Kasper,
cc'ing Herve and Marc.
I think we have a couple of things going on so I wanted to clarify. The
'genome' argument to readVcf() is assigned to the GRanges in rowData
with the "genome<-" setter. This is where .normargGenome() gets called.
x at genome <- .normargGenome(value, seqnames(x))
If the 'genome' replacement value is named, the name(s) must match the
seqnames, not the build. So we aren't talking about matching compatible
fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
vcf <- readVcf(fl, c("b37"="hg19")) ## this is wrong
vcf <- readVcf(fl, c("hg19"="hg19")) ## also wrong
Instead the name must be the seqname, the value is the build,
vcf <- readVcf(fl, c("20"="hg19")) ## correct
vcf <- readVcf(fl, "hg19") ## also correct
This requirement for 'genome' is not well documented on ?readVcf or
?Seqinfo. We can fix that.
The second thing is the issue of a flexible mapping between seqinfo
metadata for different institutions. Herve and Marc have worked on this
in AnnotationDbi. They can explain more about the 'SeqnameStyle' and how
it might be used more widely.
On 04/25/2013 06:54 AM, Kasper Daniel Hansen wrote:
> An "official" comment on this
> with some more info in this discussion
> Essentially it seems the b37 has been "patched" and this patched release is
> not reflected in hg19 but may be (I don't know) reflected in the b37
> download from NCBI
> On Thu, Apr 25, 2013 at 9:49 AM, Kasper Daniel Hansen <
> kasperdanielhansen at gmail.com> wrote:
>> I agree with Vincent. I have seen code from Herve in a package with some
>> standardization of chromosome names, and this code could perhaps be used
>> more widely so we don't have all the problems with chr1 vs chr01 vs 1.
>> However, in this particular case, if Ulrich is actually interested in the
>> mitochondrial genome, he has a problem.
>> hg19, which is the genome version from UCSC is consider equal to NCBIs
>> b37. However, as far as I understand, UCSC screwed up with the
>> mitochondrial genome and used an old version for their hg19. So the error
>> message is in many ways right here: the two genomes are slightly different
>> because they have different mitochondrial genomes.
> [[alternative HTML version deleted]]
> Bioc-devel at r-project.org mailing list
More information about the Bioc-devel