[Rd] default min-v/nsize parameters
luke-tierney at uiowa.edu
luke-tierney at uiowa.edu
Sat Jan 17 17:39:35 CET 2015
Martin Morgan discussed this a year or so ago and as I recall bumped
up these values to the current defaults. I don't recall details about
why we didn't go higher -- maybe Martin does. I suspect the main
concern would be with small memory machines in student labs and less
developed countries. If there was a way on all platforms to identify
how much memory is available that might help to set a default, though
that isn't perfect since you want something different on a large
memory machine for one R process than for 16 R processes.
Best,
luke
On Thu, 15 Jan 2015, Michael Lawrence wrote:
> Just wanted to start a discussion on whether R could ship with more
> appropriate GC parameters. Right now, loading the recommended package
> Matrix leads to:
>
>> library(Matrix)
>> gc()
> used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 1076796 57.6 1368491 73.1 1198505 64.1
> Vcells 1671329 12.8 2685683 20.5 1932418 14.8
>
> Results may vary, but here R needed 64MB of N cells and 15MB of V cells to
> load one of the most important packages.
>
> Currently, the default GC triggers are ~20MB (64 bit systems) for N cells
> and ~6MB of V cells. Martin Morgan found that this leads to a lot of GC
> overhead during package loading and at least in our tests can significantly
> increase the load time of complex packages.
>
> If we set the triggers at the command line beyond the reach of
> library(Matrix) (--min-vsize=2048M --min-nsize=45M), then we see:
>
> used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 1076859 57.6 47185920 2520 6260069 334.4
> Vcells 1671431 12.8 268435456 2048 9010303 68.8
>
> So by effectively disabling the GC, we let R consume 335MB N + 70MB of V,
> but loading goes a lot faster:
>
> Loading Matrix with default settings:
>> system.time(library(Matrix))
> user system elapsed
> 1.600 0.011 1.610
>
> With high GC triggers ():
>> system.time(library(Matrix))
> user system elapsed
> 0.983 0.097 1.079
>
> Given modern hardware capabilities and the need to efficiently load
> software for the user to be able to do something, perhaps we should bump
> the default settings so that the GC is fired sparingly when loading a large
> package.
>
> For users of Bioconductor, we see this for library(GenomicRanges):
>
> used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 1322124 70.7 47185920 2520 15591302 832.7
> Vcells 1216015 9.3 268435456 2048 13992181 106.8
>
> So perhaps that user would want 900 MB of N and 100 MB of V as the trigger
> (corresponding to --min-vsize=100M --min-nsize=16M).
>
> Thoughts?
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tierney at uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
More information about the R-devel
mailing list