[BioC] need additional sanity checks in TabixFile or readVcf

Valerie Obenchain vobencha at fhcrc.org
Wed Oct 2 20:22:37 CEST 2013


Thanks for the example.

In 1.7.52 (devel) an error will now be thrown if the supplied file is 
not compressed. Ideally we'd confirm the file is compressed when 
creating the TabixFile object but there isn't a fast easy way to check 
that (not that I know of). The next best thing is to capture the error 
when scanTabix() tries to read an uncompressed file. Here is the error 
you will see,

 > vcf <- readVcf(tab, "hg19",myparam)
Error: scanVcf: scanTabix: read line failed, corrupt or invalid file?
   path: /home/vobencha/R/library/VariantAnnotation/extdata/ex2.vcf


Valerie

On 10/01/2013 10:46 AM, Jeremy Leipzig wrote:
> Valerie,
> Thanks. You will get the same cryptic error if you use the wrong file as
> the TabixFile.
>
> vcfFile <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
> from <- vcfFile
> to <- tempfile()
> compressVcf <- bgzip(from, to)
> idx <- indexTabix(compressVcf, "vcf")
> rng <- GRanges(seqnames="20", ranges=IRanges(start=c(14000,
> 1230000),end=c(15000,1240000)))
> myparam<-ScanVcfParam(which=rng)
>
> #incorrect - the idx file does not match up with the vcf here, it
> matches up with the compressed one
> tab <- TabixFile(vcfFile,idx)
> vcf <- readVcf(tab, "hg19",myparam)
> Error in lapply(names(vcf[[1]]), function(elt) { :
>    error in evaluating the argument 'X' in selecting a method for
> function 'lapply': Error in vcf[[1]] : subscript out of bounds
>
> #correct
> tab <- TabixFile(compressVcf,idx)
> vcf <- readVcf(tab, "hg19",myparam)
>
> Regards,
> Jeremy
>



More information about the Bioconductor mailing list