[Bioc-devel] Fwd: [BioC] Problem reading VCF file using readVcf from package VariantAnnotation

Marc Carlson mcarlson at fhcrc.org
Tue Apr 30 23:16:02 CEST 2013


Related to this:

I have added getters for seqinfo (and friends) for the OrganismDb 
objects.  I have not added the setters yet though since that requires 
some refactoring of what an OrganismDb object actually is internally.  
But I intend to do this also.

   Marc



On 04/25/2013 09:32 AM, Valerie Obenchain wrote:
> Hi Vince, Kasper,
>
> cc'ing Herve and Marc.
>
> I think we have a couple of things going on so I wanted to clarify. 
> The 'genome' argument to readVcf() is assigned to the GRanges in 
> rowData with the "genome<-" setter. This is where .normargGenome() 
> gets called.
>
> setReplaceMethod("genome", "Seqinfo",
>     function(x, value)
>     {
>         x at genome <- .normargGenome(value, seqnames(x))
>         x
>     }
> )
>
> If the 'genome' replacement value is named, the name(s) must match the 
> seqnames, not the build. So we aren't talking about matching 
> compatible builds,
>
> fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
> vcf <- readVcf(fl, c("b37"="hg19"))  ## this is wrong
> vcf <- readVcf(fl, c("hg19"="hg19")) ## also wrong
>
> Instead the name must be the seqname, the value is the build,
>
> vcf <- readVcf(fl, c("20"="hg19"))  ## correct
> vcf <- readVcf(fl, "hg19")  ## also correct
>
> This requirement for 'genome' is not well documented on ?readVcf or 
> ?Seqinfo. We can fix that.
>
> The second thing is the issue of a flexible mapping between seqinfo 
> metadata for different institutions. Herve and Marc have worked on 
> this in AnnotationDbi. They can explain more about the 'SeqnameStyle' 
> and how it might be used more widely.
>
>
> Val
>
>
> On 04/25/2013 06:54 AM, Kasper Daniel Hansen wrote:
>> An "official" comment on this
>>    http://genome.ucsc.edu/cgi-bin/hgGateway?db=hg19
>> with some more info in this discussion
>>
>> https://groups.google.com/a/soe.ucsc.edu/forum/?fromgroups=#!topic/genome/hFp-dGG9gBs 
>>
>>
>> Essentially it seems the b37 has been "patched" and this patched 
>> release is
>> not reflected in hg19 but may be (I don't know) reflected in the b37
>> download from NCBI
>>
>> Kasper
>>
>>
>> On Thu, Apr 25, 2013 at 9:49 AM, Kasper Daniel Hansen <
>> kasperdanielhansen at gmail.com> wrote:
>>
>>> I agree with Vincent.  I have seen code from Herve in a package with 
>>> some
>>> standardization of chromosome names, and this code could perhaps be 
>>> used
>>> more widely so we don't have all the problems with chr1 vs chr01 vs 1.
>>>
>>> However, in this particular case, if Ulrich is actually interested 
>>> in the
>>> mitochondrial genome, he has a problem.
>>>
>>> hg19, which is the genome version from UCSC is consider equal to NCBIs
>>> b37.  However, as far as I understand, UCSC screwed up with the
>>> mitochondrial genome and used an old version for their hg19. So the 
>>> error
>>> message is in many ways right here: the two genomes are slightly 
>>> different
>>> because they have different mitochondrial genomes.
>>>
>>> Kasper
>>>
>>
>>     [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>



More information about the Bioc-devel mailing list