[Bioc-devel] GPos slower than GRanges ?
Hervé Pagès
hpages at fredhutch.org
Fri Feb 9 10:54:06 CET 2018
Hi Charles,
On 02/08/2018 08:03 PM, Charles Plessy wrote:
> Hello,
>
> I have just discovered the GPos class, and I would like to use it in
> my "CAGEr" package, where for the moment I store single-nucleotide
> positions of transcription start sites in GRanges of width 1.
>
> But a simple microbenchmark sugests that, although GPos are more
> memory-efficient, they also may be more CPU-hungry, at least
> with the "range" function.
>
> Is there a way to optimise, or is it better to stay with
> GRanges of width 1 when memory is not an issue ?
>
>> gpos1 <- GPos(c("chr1:44-53", "chr1:5-10", "chr2:2-5"))
>
>> granges1 <- GRanges(gpos1)
>
>> microbenchmark::microbenchmark(range(granges1), range(gpos1))
> Unit: milliseconds
> expr min lq mean median uq max neval cld
> range(granges1) 21.42761 21.97009 24.1627 22.24532 22.92655 179.9715 100 a
> range(gpos1) 30.11515 30.84472 32.8824 31.36639 32.19281 104.3027 100 b
Timing such small objects is not really meaningful.
GPos objects are optimized to perform well when they contain long runs
of consecutive positions. For example:
gpos2 <- GPos(GRanges("chr1", successiveIRanges(rep(990, 2000),
gapwidth=10)))
gr2 <- as(gpos2, "GRanges")
microbenchmark(range(gpos2), range(gr2))
# Unit: milliseconds
# expr min lq mean median uq max
neval cld
# range(gpos2) 102.4948 111.9229 137.5418 116.0058 134.2129 239.0805
100 a
# range(gr2) 111.3651 118.2075 154.2758 133.3702 211.2164 232.4975
100 b
microbenchmark(coverage(gpos2), coverage(gr2))
# Unit: milliseconds
# expr min lq mean median uq
max neval
# coverage(gpos2) 98.09502 106.3827 143.7039 111.9778 138.1875
304.8126 100
# coverage(gr2) 152.82492 168.9123 204.8362 175.1129 189.7343
363.9795 100
cld
a
b
so not a big difference but a small advantage for GPos.
However, a big advantage for GPos in terms of memory footprint:
object.size(gpos2)
# 26520 bytes
object.size(gr2)
# 15849120 bytes
Anyway, if memory is not an issue, then it won't make much difference
whether you use GRanges or GPos.
Cheers,
H.
>
>> sessionInfo()
> R version 3.4.3 (2017-11-30)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Debian GNU/Linux 9 (stretch)
>
> Matrix products: default
> BLAS: /usr/lib/libblas/libblas.so.3.7.0
> LAPACK: /usr/lib/lapack/liblapack.so.3.7.0
>
> locale:
> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8
> [4] LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C LC_ADDRESS=C
> [10] LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats4 stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] GenomicRanges_1.31.16 GenomeInfoDb_1.15.5 IRanges_2.13.22 S4Vectors_0.17.30
> [5] BiocGenerics_0.25.2
>
> loaded via a namespace (and not attached):
> [1] Rcpp_0.12.14 XVector_0.19.8 MASS_7.3-47 splines_3.4.3
> [5] zlibbioc_1.24.0 munsell_0.4.3 lattice_0.20-35 colorspace_1.3-2
> [9] rlang_0.1.4 multcomp_1.4-8 plyr_1.8.4 tools_3.4.3
> [13] grid_3.4.3 gtable_0.2.0 TH.data_1.0-8 survival_2.41-3
> [17] yaml_2.1.15 lazyeval_0.2.1 tibble_1.3.4 Matrix_1.2-12
> [21] GenomeInfoDbData_0.99.1 ggplot2_2.2.1 codetools_0.2-15 microbenchmark_1.4-2.1
> [25] bitops_1.0-6 RCurl_1.95-4.10 sandwich_2.4-0 compiler_3.4.3
> [29] scales_0.5.0 mvtnorm_1.0-6 zoo_1.8-0
>
> (I have also made a benchmark on "real" data, which confirmed the test above)
>
> Have a nice day,
>
> Charles
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fredhutch.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-devel
mailing list