[Bioc-devel] GPos slower than GRanges ?

Fri Feb 9 05:03:47 CET 2018

Hello,

I have just discovered the GPos class, and I would like to use it in
my "CAGEr" package, where for the moment I store single-nucleotide
positions of transcription start sites in GRanges of width 1.

But a simple microbenchmark sugests that, although GPos are more
memory-efficient, they also may be more CPU-hungry, at least
with the "range" function.

Is there a way to optimise, or is it better to stay with
GRanges of width 1 when memory is not an issue ?

> gpos1 <- GPos(c("chr1:44-53", "chr1:5-10", "chr2:2-5"))

> granges1 <- GRanges(gpos1)

> microbenchmark::microbenchmark(range(granges1), range(gpos1))
Unit: milliseconds
            expr      min       lq    mean   median       uq      max neval cld
 range(granges1) 21.42761 21.97009 24.1627 22.24532 22.92655 179.9715   100  a 
    range(gpos1) 30.11515 30.84472 32.8824 31.36639 32.19281 104.3027   100   b

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.7.0
LAPACK: /usr/lib/lapack/liblapack.so.3.7.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8       
 [4] LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GenomicRanges_1.31.16 GenomeInfoDb_1.15.5   IRanges_2.13.22       S4Vectors_0.17.30    
[5] BiocGenerics_0.25.2  

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.14            XVector_0.19.8          MASS_7.3-47             splines_3.4.3          
 [5] zlibbioc_1.24.0         munsell_0.4.3           lattice_0.20-35         colorspace_1.3-2       
 [9] rlang_0.1.4             multcomp_1.4-8          plyr_1.8.4              tools_3.4.3            
[13] grid_3.4.3              gtable_0.2.0            TH.data_1.0-8           survival_2.41-3        
[17] yaml_2.1.15             lazyeval_0.2.1          tibble_1.3.4            Matrix_1.2-12          
[21] GenomeInfoDbData_0.99.1 ggplot2_2.2.1           codetools_0.2-15        microbenchmark_1.4-2.1 
[25] bitops_1.0-6            RCurl_1.95-4.10         sandwich_2.4-0          compiler_3.4.3         
[29] scales_0.5.0            mvtnorm_1.0-6           zoo_1.8-0     

(I have also made a benchmark on "real" data, which confirmed the test above)

Have a nice day,

Charles

-- 
Charles Plessy, Ph.D. – RIKEN Center for Life Science Technologies
Division of Genomic Technologies – Genomics Miniaturization Technology Unit
1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045 Japan
■■□―――――――――― http://population-transcriptomics.org ――――――――――□■■