[Bioc-devel] VariantAnnotation/Rsamtools empty ranges bug and suggested patch
Valerie Obenchain
vobencha at fhcrc.org
Thu Dec 15 19:07:57 CET 2011
Hi Richard,
Thanks for identifying the bugs and for providing solutions.
Parsing empty INFO or GENO :
For this bug I put a check in the unpackVcf x="list", hdr="character"
method, which is a couple of levels up from what you suggest below. The
reason is that we want 'info' and 'geno' arguments to serve as an on/off
switch for parsing the fields. By checking lengths at the entry point,
we set can info=FALSE and/or geno=FALSE and the fields will not be
parsed. Fixed in Rsamtools 1.7.9.
Dropping ranges at 'end' :
This problem appears to be in the indexTabix() function. I'm working on
it now and will come back here when it's fixed.
Valerie
On 12/13/2011 04:22 AM, Richard Pearson wrote:
> Hi
>
> I've spotted a problem with the current devel versions of
> VariantAnnotation/Rsamtools when trying to read in variants from a
> region which has no variants. Section 2 of the current
> VariantAnnotation devel vignette gives an almost reproducible example
> of this. I say almost as I think the full set of commands (note
> inclusion of indexTabix command) should be:
>
> library(VariantAnnotation)
> vcffile <- system.file("extdata", "chr22.vcf.gz",
> package="VariantAnnotation")
> idx <- indexTabix(vcffile, "vcf")
> rngs <- GRanges(seqnames="22", ranges=IRanges(start=1000, end=10000))
> subst <- readVcf(vcffile, "hg19", param=rngs)
>
> When I run the above I get the following error:
>
> Error in strsplit(unlist(recs), "=", fixed = TRUE) :
> non-character argument
>
> The first variant in chr22.vcf.gz is at position 51151293, and hence
> there are no variants in the range 1000-10000.
>
> The following patch to Rsamtools gets the above to work for me, in
> what I think is a sensible way:
>
> Index: Rsamtools/R/scanVcf.R
> ===================================================================
> --- Rsamtools/R/scanVcf.R (revision 61294)
> +++ Rsamtools/R/scanVcf.R (working copy)
> @@ -248,7 +248,9 @@
> {
> if (!is.logical(info))
> x <- lapply(x, function(elt, id, n, type) {
> - elt[["INFO"]] <- .unpackVcfInfo(elt[["INFO"]], id, n, type)
> + if(length(elt[["INFO"]]) > 0) {
> + elt[["INFO"]] <- .unpackVcfInfo(elt[["INFO"]], id, n,
> type)
> + }
> elt
> }, rownames(info), info$Number, info$Type)
> if (!is.logical(geno))
>
>
> I've also realised that the above commands will only return up until
> the position a base before the end of the IRanges. Is this as
> expected? E.g.
>
> > rngs <- GRanges(seqnames="22", ranges=IRanges(start=1000,
> end=51151350))
> > subst <- readVcf(vcffile, "hg19", param=rngs)
> > ranges(rowData(subst))
> IRanges of length 1
> start end width names
> [1] 51151293 51151293 1 rs189846332
>
> > rngs <- GRanges(seqnames="22", ranges=IRanges(start=1000,
> end=51151351))
> > subst <- readVcf(vcffile, "hg19", param=rngs)
> > ranges(rowData(subst))
> IRanges of length 2
> start end width names
> [1] 51151293 51151293 1 rs189846332
> [2] 51151350 51151350 1 rs6009951
>
> Best wishes
>
> Richard
>
>
More information about the Bioc-devel
mailing list