[Bioc-devel] VariantAnnotation/Rsamtools empty ranges bug and suggested patch

Valerie Obenchain vobencha at fhcrc.org
Thu Dec 15 19:07:57 CET 2011


Hi Richard,

Thanks for identifying the bugs and for providing solutions.

Parsing empty INFO or GENO :

For this bug I put a check in the unpackVcf x="list", hdr="character" 
method, which is a couple of levels up from what you suggest below. The 
reason is that we want 'info' and 'geno' arguments to serve as an on/off 
switch for parsing the fields. By checking lengths at the entry point, 
we set can info=FALSE and/or geno=FALSE and the fields will not be 
parsed. Fixed in Rsamtools 1.7.9.


Dropping ranges at 'end' :

This problem appears to be in the indexTabix() function. I'm working on 
it now and will come back here when it's fixed.

Valerie

On 12/13/2011 04:22 AM, Richard Pearson wrote:
> Hi
>
> I've spotted a problem with the current devel versions of 
> VariantAnnotation/Rsamtools when trying to read in variants from a 
> region which has no variants. Section 2 of the current 
> VariantAnnotation devel vignette gives an almost reproducible example 
> of this. I say almost as I think the full set of commands (note 
> inclusion of indexTabix command) should be:
>
> library(VariantAnnotation)
> vcffile <- system.file("extdata", "chr22.vcf.gz", 
> package="VariantAnnotation")
> idx <- indexTabix(vcffile, "vcf")
> rngs <- GRanges(seqnames="22", ranges=IRanges(start=1000, end=10000))
> subst <- readVcf(vcffile, "hg19", param=rngs)
>
> When I run the above I get the following error:
>
> Error in strsplit(unlist(recs), "=", fixed = TRUE) :
>   non-character argument
>
> The first variant in chr22.vcf.gz is at position 51151293, and hence 
> there are no variants in the range 1000-10000.
>
> The following patch to Rsamtools gets the above to work for me, in 
> what I think is a sensible way:
>
> Index: Rsamtools/R/scanVcf.R
> ===================================================================
> --- Rsamtools/R/scanVcf.R    (revision 61294)
> +++ Rsamtools/R/scanVcf.R    (working copy)
> @@ -248,7 +248,9 @@
>  {
>      if (!is.logical(info))
>          x <- lapply(x, function(elt, id, n, type) {
> -            elt[["INFO"]] <- .unpackVcfInfo(elt[["INFO"]], id, n, type)
> +            if(length(elt[["INFO"]]) > 0) {
> +                elt[["INFO"]] <- .unpackVcfInfo(elt[["INFO"]], id, n, 
> type)
> +            }
>              elt
>          }, rownames(info), info$Number, info$Type)
>      if (!is.logical(geno))
>
>
> I've also realised that the above commands will only return up until 
> the position a base before the end of the IRanges. Is this as 
> expected? E.g.
>
> > rngs <- GRanges(seqnames="22", ranges=IRanges(start=1000, 
> end=51151350))
> > subst <- readVcf(vcffile, "hg19", param=rngs)
> > ranges(rowData(subst))
> IRanges of length 1
>        start      end width       names
> [1] 51151293 51151293     1 rs189846332
>
> > rngs <- GRanges(seqnames="22", ranges=IRanges(start=1000, 
> end=51151351))
> > subst <- readVcf(vcffile, "hg19", param=rngs)
> > ranges(rowData(subst))
> IRanges of length 2
>        start      end width       names
> [1] 51151293 51151293     1 rs189846332
> [2] 51151350 51151350     1   rs6009951
>
> Best wishes
>
> Richard
>
>



More information about the Bioc-devel mailing list