[Bioc-devel] poor performance of snpsByOverlaps()
Robert Castelo
robert.castelo at upf.edu
Fri Jun 17 18:53:32 CEST 2016
hi,
the performance of snpsByOverlaps() in terms of time and memory
consumption is quite poor and i wonder whether there is some bug in the
code. here's one example:
library(GenomicRanges)
library(SNPlocs.Hsapiens.dbSNP144.GRCh37)
snps <- SNPlocs.Hsapiens.dbSNP144.GRCh37
gr <- GRanges(seqnames="ch10", IRanges(123276830, 123276830))
system.time(ov <- snpsByOverlaps(snps, gr))
user system elapsed
33.768 0.124 33.955
system.time(ov <- snpsByOverlaps(snps, gr))
user system elapsed
33.150 0.281 33.494
i've shown the call to snpsByOverlaps() twice to account for the fact
that maybe the first call was caching data and the second could be much
faster, but it is not the case.
if i do the same but with a larger GRanges object, for instance the one
attached to this email, then the memory consumption grows until about 20
Gbytes. to me this in conjunction with the previous observation,
suggests something wrong about the caching of the data.
i look forward to your comments and possible solutions,
thanks!!!
robert.
More information about the Bioc-devel
mailing list