[Bioc-devel] VariantAnnotation: streaming over a genomic range
Robert Castelo
robert.castelo at upf.edu
Tue Mar 27 13:02:46 CEST 2018
hi,
when i use the function 'readVcf()' from the 'VariantAnnotation' package
through a 'TabixFile' object, i can stream over it even if i do not
specify the argument 'yieldSize', although in that case the streaming is
trivial because is reading the whole file into main memory:
CEUvcf <- file.path(system.file("extdata", package="VariantFiltering"),
"CEUtrio.vcf.bgz")
tab <- TabixFile(CEUvcf)
open(tab)
vcf <- readVcf(tab, "hg19")
dim(vcf)
[1] 1000 3
vcf <- readVcf(tab, "hg19")
dim(vcf)
[1] 0 3
close(tab)
this is useful because i can package this into a function in my package
'VariantFiltering' using the same code regardless of whether 'yieldSize'
is specified or not, e.g.:
f <- function(fname, yieldSize=NA, genome, vcfpar=ScanVcfParam()) {
tab <- TabixFile(fname, yieldSize=yieldSize)
while (nrow(vcf <- readVcf(tab, genome=genome, param=vcfpar))) {
## do something with the 'vcf' object
}
## return something
}
however, if i further specify a range though the 'ScanVcfParam()'
function, but yet without any 'yieldSize' as before, the streaming does
not work anymore:
vcfpar <- ScanVcfParam(which=GRanges("7:107341641"))
open(tab)
vcf <- readVcf(tab, "hg19", param=vcfpar)
dim(vcf)
[1] 1 3
vcf <- readVcf(tab, "hg19", param=vcfpar)
dim(vcf)
[1] 1 3
vcf <- readVcf(tab, "hg19", param=vcfpar)
dim(vcf)
[1] 1 3
[...]
in this example i'm restricting the range to be read to 1 single variant
and successive calls to 'readVcf()' read the same variant over and over
again without streaming over the range.
is this the intended behavior?
my use case is that when analyzing VCFs from genomes, with several
million variants, i may want to restrict my analysis to a chunk of the
genome of several megabases, say 30 or 40, which is too much to load all
of it into main memory and would like to stream over these 30 or 40
megabases. would be possible to have this feature?
thanks!
robert.
More information about the Bioc-devel
mailing list