[Rd] default min-v/nsize parameters

luke-tierney at uiowa.edu luke-tierney at uiowa.edu
Sat Jan 17 17:39:35 CET 2015


Martin Morgan discussed this a year or so ago and as I recall bumped
up these values to the current defaults. I don't recall details about
why we didn't go higher -- maybe Martin does. I suspect the main
concern would be with small memory machines in student labs and less
developed countries. If there was a way on all platforms to identify
how much memory is available that might help to set a default, though
that isn't perfect since you want something different on a large
memory machine for one R process than for 16 R processes.

Best,

luke

On Thu, 15 Jan 2015, Michael Lawrence wrote:

> Just wanted to start a discussion on whether R could ship with more
> appropriate GC parameters. Right now, loading the recommended package
> Matrix leads to:
>
>> library(Matrix)
>> gc()
>          used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 1076796 57.6    1368491 73.1  1198505 64.1
> Vcells 1671329 12.8    2685683 20.5  1932418 14.8
>
> Results may vary, but here R needed 64MB of N cells and 15MB of V cells to
> load one of the most important packages.
>
> Currently, the default GC triggers are ~20MB (64 bit systems) for N cells
> and ~6MB of V cells. Martin Morgan found that this leads to a lot of GC
> overhead during package loading and at least in our tests can significantly
> increase the load time of complex packages.
>
> If we set the triggers at the command line beyond the reach of
> library(Matrix) (--min-vsize=2048M --min-nsize=45M), then we see:
>
>          used (Mb) gc trigger (Mb) max used  (Mb)
> Ncells 1076859 57.6   47185920 2520  6260069 334.4
> Vcells 1671431 12.8  268435456 2048  9010303  68.8
>
> So by effectively disabling the GC, we let R consume 335MB N + 70MB of V,
> but loading goes a lot faster:
>
> Loading Matrix with default settings:
>> system.time(library(Matrix))
>   user  system elapsed
>  1.600   0.011   1.610
>
> With high GC triggers ():
>> system.time(library(Matrix))
>   user  system elapsed
>  0.983   0.097   1.079
>
> Given modern hardware capabilities and the need to efficiently load
> software for the user to be able to do something, perhaps we should bump
> the default settings so that the GC is fired sparingly when loading a large
> package.
>
> For users of Bioconductor, we see this for library(GenomicRanges):
>
>          used (Mb) gc trigger (Mb) max used  (Mb)
> Ncells 1322124 70.7   47185920 2520 15591302 832.7
> Vcells 1216015  9.3  268435456 2048 13992181 106.8
>
> So perhaps that user would want 900 MB of N and 100 MB of V as the trigger
> (corresponding to --min-vsize=100M --min-nsize=16M).
>
> Thoughts?
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   luke-tierney at uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu



More information about the R-devel mailing list