[Bioc-devel] Fwd: [BioC] Problem reading VCF file using readVcf from package VariantAnnotation

Valerie Obenchain vobencha at fhcrc.org
Thu Apr 25 18:32:58 CEST 2013


Hi Vince, Kasper,

cc'ing Herve and Marc.

I think we have a couple of things going on so I wanted to clarify. The 
'genome' argument to readVcf() is assigned to the GRanges in rowData 
with the "genome<-" setter. This is where .normargGenome() gets called.

setReplaceMethod("genome", "Seqinfo",
     function(x, value)
     {
         x at genome <- .normargGenome(value, seqnames(x))
         x
     }
)

If the 'genome' replacement value is named, the name(s) must match the 
seqnames, not the build. So we aren't talking about matching compatible 
builds,

fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
vcf <- readVcf(fl, c("b37"="hg19"))  ## this is wrong
vcf <- readVcf(fl, c("hg19"="hg19")) ## also wrong

Instead the name must be the seqname, the value is the build,

vcf <- readVcf(fl, c("20"="hg19"))  ## correct
vcf <- readVcf(fl, "hg19")  ## also correct

This requirement for 'genome' is not well documented on ?readVcf or 
?Seqinfo. We can fix that.

The second thing is the issue of a flexible mapping between seqinfo 
metadata for different institutions. Herve and Marc have worked on this 
in AnnotationDbi. They can explain more about the 'SeqnameStyle' and how 
it might be used more widely.


Val


On 04/25/2013 06:54 AM, Kasper Daniel Hansen wrote:
> An "official" comment on this
>    http://genome.ucsc.edu/cgi-bin/hgGateway?db=hg19
> with some more info in this discussion
>
> https://groups.google.com/a/soe.ucsc.edu/forum/?fromgroups=#!topic/genome/hFp-dGG9gBs
>
> Essentially it seems the b37 has been "patched" and this patched release is
> not reflected in hg19 but may be (I don't know) reflected in the b37
> download from NCBI
>
> Kasper
>
>
> On Thu, Apr 25, 2013 at 9:49 AM, Kasper Daniel Hansen <
> kasperdanielhansen at gmail.com> wrote:
>
>> I agree with Vincent.  I have seen code from Herve in a package with some
>> standardization of chromosome names, and this code could perhaps be used
>> more widely so we don't have all the problems with chr1 vs chr01 vs 1.
>>
>> However, in this particular case, if Ulrich is actually interested in the
>> mitochondrial genome, he has a problem.
>>
>> hg19, which is the genome version from UCSC is consider equal to NCBIs
>> b37.  However, as far as I understand, UCSC screwed up with the
>> mitochondrial genome and used an old version for their hg19.  So the error
>> message is in many ways right here: the two genomes are slightly different
>> because they have different mitochondrial genomes.
>>
>> Kasper
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>



More information about the Bioc-devel mailing list