[Bioc-devel] VariantAnnotation/Rsamtools empty ranges bug and suggested patch

Valerie Obenchain vobencha at fhcrc.org
Fri Dec 16 15:45:53 CET 2011


Now fixed in Rsamtools 1.7.12 in devel and 1.6.3 in release.

Valerie


On 12/15/11 10:07, Valerie Obenchain wrote:
> Hi Richard,
>
> Thanks for identifying the bugs and for providing solutions.
>
> Parsing empty INFO or GENO :
>
> For this bug I put a check in the unpackVcf x="list", hdr="character" 
> method, which is a couple of levels up from what you suggest below. 
> The reason is that we want 'info' and 'geno' arguments to serve as an 
> on/off switch for parsing the fields. By checking lengths at the entry 
> point, we set can info=FALSE and/or geno=FALSE and the fields will not 
> be parsed. Fixed in Rsamtools 1.7.9.
>
>
> Dropping ranges at 'end' :
>
> This problem appears to be in the indexTabix() function. I'm working 
> on it now and will come back here when it's fixed.
>
> Valerie
>
> On 12/13/2011 04:22 AM, Richard Pearson wrote:
>> Hi
>>
>> I've spotted a problem with the current devel versions of 
>> VariantAnnotation/Rsamtools when trying to read in variants from a 
>> region which has no variants. Section 2 of the current 
>> VariantAnnotation devel vignette gives an almost reproducible example 
>> of this. I say almost as I think the full set of commands (note 
>> inclusion of indexTabix command) should be:
>>
>> library(VariantAnnotation)
>> vcffile <- system.file("extdata", "chr22.vcf.gz", 
>> package="VariantAnnotation")
>> idx <- indexTabix(vcffile, "vcf")
>> rngs <- GRanges(seqnames="22", ranges=IRanges(start=1000, end=10000))
>> subst <- readVcf(vcffile, "hg19", param=rngs)
>>
>> When I run the above I get the following error:
>>
>> Error in strsplit(unlist(recs), "=", fixed = TRUE) :
>>   non-character argument
>>
>> The first variant in chr22.vcf.gz is at position 51151293, and hence 
>> there are no variants in the range 1000-10000.
>>
>> The following patch to Rsamtools gets the above to work for me, in 
>> what I think is a sensible way:
>>
>> Index: Rsamtools/R/scanVcf.R
>> ===================================================================
>> --- Rsamtools/R/scanVcf.R    (revision 61294)
>> +++ Rsamtools/R/scanVcf.R    (working copy)
>> @@ -248,7 +248,9 @@
>>  {
>>      if (!is.logical(info))
>>          x <- lapply(x, function(elt, id, n, type) {
>> -            elt[["INFO"]] <- .unpackVcfInfo(elt[["INFO"]], id, n, type)
>> +            if(length(elt[["INFO"]]) > 0) {
>> +                elt[["INFO"]] <- .unpackVcfInfo(elt[["INFO"]], id, 
>> n, type)
>> +            }
>>              elt
>>          }, rownames(info), info$Number, info$Type)
>>      if (!is.logical(geno))
>>
>>
>> I've also realised that the above commands will only return up until 
>> the position a base before the end of the IRanges. Is this as 
>> expected? E.g.
>>
>> > rngs <- GRanges(seqnames="22", ranges=IRanges(start=1000, 
>> end=51151350))
>> > subst <- readVcf(vcffile, "hg19", param=rngs)
>> > ranges(rowData(subst))
>> IRanges of length 1
>>        start      end width       names
>> [1] 51151293 51151293     1 rs189846332
>>
>> > rngs <- GRanges(seqnames="22", ranges=IRanges(start=1000, 
>> end=51151351))
>> > subst <- readVcf(vcffile, "hg19", param=rngs)
>> > ranges(rowData(subst))
>> IRanges of length 2
>>        start      end width       names
>> [1] 51151293 51151293     1 rs189846332
>> [2] 51151350 51151350     1   rs6009951
>>
>> Best wishes
>>
>> Richard
>>
>>
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list