[Rd] default min-v/nsize parameters
Michael Lawrence
lawrence.michael at gene.com
Fri Jan 16 00:55:42 CET 2015
Just wanted to start a discussion on whether R could ship with more
appropriate GC parameters. Right now, loading the recommended package
Matrix leads to:
> library(Matrix)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1076796 57.6 1368491 73.1 1198505 64.1
Vcells 1671329 12.8 2685683 20.5 1932418 14.8
Results may vary, but here R needed 64MB of N cells and 15MB of V cells to
load one of the most important packages.
Currently, the default GC triggers are ~20MB (64 bit systems) for N cells
and ~6MB of V cells. Martin Morgan found that this leads to a lot of GC
overhead during package loading and at least in our tests can significantly
increase the load time of complex packages.
If we set the triggers at the command line beyond the reach of
library(Matrix) (--min-vsize=2048M --min-nsize=45M), then we see:
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1076859 57.6 47185920 2520 6260069 334.4
Vcells 1671431 12.8 268435456 2048 9010303 68.8
So by effectively disabling the GC, we let R consume 335MB N + 70MB of V,
but loading goes a lot faster:
Loading Matrix with default settings:
> system.time(library(Matrix))
user system elapsed
1.600 0.011 1.610
With high GC triggers ():
> system.time(library(Matrix))
user system elapsed
0.983 0.097 1.079
Given modern hardware capabilities and the need to efficiently load
software for the user to be able to do something, perhaps we should bump
the default settings so that the GC is fired sparingly when loading a large
package.
For users of Bioconductor, we see this for library(GenomicRanges):
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1322124 70.7 47185920 2520 15591302 832.7
Vcells 1216015 9.3 268435456 2048 13992181 106.8
So perhaps that user would want 900 MB of N and 100 MB of V as the trigger
(corresponding to --min-vsize=100M --min-nsize=16M).
Thoughts?
[[alternative HTML version deleted]]
More information about the R-devel
mailing list