[Bioc-devel] Fwd: [BioC] Problem reading VCF file using readVcf from package VariantAnnotation
Marc Carlson
mcarlson at fhcrc.org
Tue Apr 30 23:16:02 CEST 2013
Related to this:
I have added getters for seqinfo (and friends) for the OrganismDb
objects. I have not added the setters yet though since that requires
some refactoring of what an OrganismDb object actually is internally.
But I intend to do this also.
Marc
On 04/25/2013 09:32 AM, Valerie Obenchain wrote:
> Hi Vince, Kasper,
>
> cc'ing Herve and Marc.
>
> I think we have a couple of things going on so I wanted to clarify.
> The 'genome' argument to readVcf() is assigned to the GRanges in
> rowData with the "genome<-" setter. This is where .normargGenome()
> gets called.
>
> setReplaceMethod("genome", "Seqinfo",
> function(x, value)
> {
> x at genome <- .normargGenome(value, seqnames(x))
> x
> }
> )
>
> If the 'genome' replacement value is named, the name(s) must match the
> seqnames, not the build. So we aren't talking about matching
> compatible builds,
>
> fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
> vcf <- readVcf(fl, c("b37"="hg19")) ## this is wrong
> vcf <- readVcf(fl, c("hg19"="hg19")) ## also wrong
>
> Instead the name must be the seqname, the value is the build,
>
> vcf <- readVcf(fl, c("20"="hg19")) ## correct
> vcf <- readVcf(fl, "hg19") ## also correct
>
> This requirement for 'genome' is not well documented on ?readVcf or
> ?Seqinfo. We can fix that.
>
> The second thing is the issue of a flexible mapping between seqinfo
> metadata for different institutions. Herve and Marc have worked on
> this in AnnotationDbi. They can explain more about the 'SeqnameStyle'
> and how it might be used more widely.
>
>
> Val
>
>
> On 04/25/2013 06:54 AM, Kasper Daniel Hansen wrote:
>> An "official" comment on this
>> http://genome.ucsc.edu/cgi-bin/hgGateway?db=hg19
>> with some more info in this discussion
>>
>> https://groups.google.com/a/soe.ucsc.edu/forum/?fromgroups=#!topic/genome/hFp-dGG9gBs
>>
>>
>> Essentially it seems the b37 has been "patched" and this patched
>> release is
>> not reflected in hg19 but may be (I don't know) reflected in the b37
>> download from NCBI
>>
>> Kasper
>>
>>
>> On Thu, Apr 25, 2013 at 9:49 AM, Kasper Daniel Hansen <
>> kasperdanielhansen at gmail.com> wrote:
>>
>>> I agree with Vincent. I have seen code from Herve in a package with
>>> some
>>> standardization of chromosome names, and this code could perhaps be
>>> used
>>> more widely so we don't have all the problems with chr1 vs chr01 vs 1.
>>>
>>> However, in this particular case, if Ulrich is actually interested
>>> in the
>>> mitochondrial genome, he has a problem.
>>>
>>> hg19, which is the genome version from UCSC is consider equal to NCBIs
>>> b37. However, as far as I understand, UCSC screwed up with the
>>> mitochondrial genome and used an old version for their hg19. So the
>>> error
>>> message is in many ways right here: the two genomes are slightly
>>> different
>>> because they have different mitochondrial genomes.
>>>
>>> Kasper
>>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
More information about the Bioc-devel
mailing list