[Bioc-devel] Fwd: [BioC] Problem reading VCF file using readVcf from package VariantAnnotation
mcarlson at fhcrc.org
Tue Apr 30 23:16:02 CEST 2013
Related to this:
I have added getters for seqinfo (and friends) for the OrganismDb
objects. I have not added the setters yet though since that requires
some refactoring of what an OrganismDb object actually is internally.
But I intend to do this also.
On 04/25/2013 09:32 AM, Valerie Obenchain wrote:
> Hi Vince, Kasper,
> cc'ing Herve and Marc.
> I think we have a couple of things going on so I wanted to clarify.
> The 'genome' argument to readVcf() is assigned to the GRanges in
> rowData with the "genome<-" setter. This is where .normargGenome()
> gets called.
> setReplaceMethod("genome", "Seqinfo",
> function(x, value)
> x at genome <- .normargGenome(value, seqnames(x))
> If the 'genome' replacement value is named, the name(s) must match the
> seqnames, not the build. So we aren't talking about matching
> compatible builds,
> fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
> vcf <- readVcf(fl, c("b37"="hg19")) ## this is wrong
> vcf <- readVcf(fl, c("hg19"="hg19")) ## also wrong
> Instead the name must be the seqname, the value is the build,
> vcf <- readVcf(fl, c("20"="hg19")) ## correct
> vcf <- readVcf(fl, "hg19") ## also correct
> This requirement for 'genome' is not well documented on ?readVcf or
> ?Seqinfo. We can fix that.
> The second thing is the issue of a flexible mapping between seqinfo
> metadata for different institutions. Herve and Marc have worked on
> this in AnnotationDbi. They can explain more about the 'SeqnameStyle'
> and how it might be used more widely.
> On 04/25/2013 06:54 AM, Kasper Daniel Hansen wrote:
>> An "official" comment on this
>> with some more info in this discussion
>> Essentially it seems the b37 has been "patched" and this patched
>> release is
>> not reflected in hg19 but may be (I don't know) reflected in the b37
>> download from NCBI
>> On Thu, Apr 25, 2013 at 9:49 AM, Kasper Daniel Hansen <
>> kasperdanielhansen at gmail.com> wrote:
>>> I agree with Vincent. I have seen code from Herve in a package with
>>> standardization of chromosome names, and this code could perhaps be
>>> more widely so we don't have all the problems with chr1 vs chr01 vs 1.
>>> However, in this particular case, if Ulrich is actually interested
>>> in the
>>> mitochondrial genome, he has a problem.
>>> hg19, which is the genome version from UCSC is consider equal to NCBIs
>>> b37. However, as far as I understand, UCSC screwed up with the
>>> mitochondrial genome and used an old version for their hg19. So the
>>> message is in many ways right here: the two genomes are slightly
>>> because they have different mitochondrial genomes.
>> [[alternative HTML version deleted]]
>> Bioc-devel at r-project.org mailing list
More information about the Bioc-devel