[Bioc-devel] poor performance of snpsByOverlaps()

Robert Castelo robert.castelo at upf.edu
Fri Jun 17 18:53:32 CEST 2016


hi,

the performance of snpsByOverlaps() in terms of time and memory 
consumption is quite poor and i wonder whether there is some bug in the 
code. here's one example:

library(GenomicRanges)
library(SNPlocs.Hsapiens.dbSNP144.GRCh37)

snps <- SNPlocs.Hsapiens.dbSNP144.GRCh37

gr <- GRanges(seqnames="ch10", IRanges(123276830, 123276830))

system.time(ov <- snpsByOverlaps(snps, gr))
    user  system elapsed
  33.768   0.124  33.955

system.time(ov <- snpsByOverlaps(snps, gr))
    user  system elapsed
  33.150   0.281  33.494


i've shown the call to snpsByOverlaps() twice to account for the fact 
that maybe the first call was caching data and the second could be much 
faster, but it is not the case.

if i do the same but with a larger GRanges object, for instance the one 
attached to this email, then the memory consumption grows until about 20 
Gbytes. to me this in conjunction with the previous observation, 
suggests something wrong about the caching of the data.



i look forward to your comments and possible solutions,


thanks!!!


robert.


More information about the Bioc-devel mailing list