[Rd] default min-v/nsize parameters
Henrik Bengtsson
hb at biostat.ucsf.edu
Tue Jan 20 19:58:08 CET 2015
Thanks for this.
Anyone know how I can find what those initial settings are from within
R? Do I need to parse/look at both environment variables R_NSIZE and
R_VSIZE and then commandArgs()?
/Henrik
On Tue, Jan 20, 2015 at 1:42 AM, Martin Maechler
<maechler at stat.math.ethz.ch> wrote:
>>>>>> Peter Haverty <haverty.peter at gene.com>
>>>>>> on Mon, 19 Jan 2015 08:50:08 -0800 writes:
>
> > Hi All, This is a very important issue. It would be very
> > sad to leave most users unaware of a free speedup of this
> > size. These options don't appear in the R --help
> > output. They really should be added there.
>
> Indeed, I've found that myself and had added them there about
> 24 hours ago.
> ((I think they were accidentally dropped a while ago))
>
> > if the garbage collector is working very hard, might it
> > emit a note about better setting for these variables?
>
> > It's not really my place to comment on design philosophy,
> > but if there is a configure option for small memory
> > machines I would assume that would be sufficient for the
> > folks that are not on fairly current hardware.
>
> There's quite a few more issues with this,
> notably how the growth *steps* are done.
> That has been somewhat experimental and for that reason is
> _currently_ quite configurable via R_GC_* environment variables,
> see the code in src/main/memory.c
>
> This is currently discussed "privately" within the R core.
> I'm somewhat confident that R 3.2.0 in April will have changes.
>
> And -- coming back to the beginning -- at least the "R-devel" version now shows
>
> R --help | grep -e min-.size
>
> --min-nsize=N Set min number of fixed size obj's ("cons cells") to N
> --min-vsize=N Set vector heap minimum to N bytes; '4M' = 4 MegaB
>
> --
> Martin Maechler, ETH Zurich
>
> > On Sat, Jan 17, 2015 at 11:40 PM, Nathan Kurz <nate at verse.com> wrote:
>
> >> On Thu, Jan 15, 2015 at 3:55 PM, Michael Lawrence
> >> <lawrence.michael at gene.com> wrote:
> >> > Just wanted to start a discussion on whether R could ship with more
> >> > appropriate GC parameters.
> >>
> >> I've been doing a number of similar measurements, and have come to the
> >> same conclusion. R is currently very conservative about memory usage,
> >> and this leads to unnecessarily poor performance on certain problems.
> >> Changing the defaults to sizes that are more appropriate for modern
> >> machines can often produce a 2x speedup.
> >>
> >> On Sat, Jan 17, 2015 at 8:39 AM, <luke-tierney at uiowa.edu> wrote:
> >> > Martin Morgan discussed this a year or so ago and as I recall bumped
> >> > up these values to the current defaults. I don't recall details about
> >> > why we didn't go higher -- maybe Martin does.
> >>
> >> I just checked, and it doesn't seem that any of the relevant values
> >> have been increased in the last ten years. Do you have a link to the
> >> discussion you recall so we can see why the changes weren't made?
> >>
> >> > I suspect the main concern would be with small memory machines in
> >> student labs
> >> > and less developed countries.
> >>
> >> While a reasonable concern, I'm doubtful there are many machines for
> >> which the current numbers are optimal. The current minimum size
> >> increases for node and vector heaps are 40KB and 80KB respectively.
> >> This grows as the heap grows (min + .05 * heap), but still means that
> >> we do many more expensive garbage collections at while growing than we
> >> need to. Paradoxically, the SMALL_MEMORY compile option (which is
> >> suggestd for computers with up to 32MB of RAM) has slightly larger at
> >> 50KB and 100KB.
> >>
> >> I think we'd get significant benefit for most users by being less
> >> conservative about memory consumption. The exact sizes should be
> >> discussed, but with RAM costing about $10/GB it doesn't seem
> >> unreasonable to assume most machines running R have multiple GB
> >> installed, and those that don't will quite likely be running an OS
> >> that needs a custom compiled binary anyway.
> >>
> >> I could be way off, but my suggestion might be a 10MB start with 1MB
> >> minimum increments for SMALL_MEMORY, 100MB start with 10MB increments
> >> for NORMAL_MEMORY, and 1GB start with 100MB increments for
> >> LARGE_MEMORY might be a reasonable spread.
> >>
> >> Or one could go even larger, noting that on most systems,
> >> overcommitted memory is not a problem until it is used. Until we
> >> write to it, it doesn't actually use physical RAM, just virtual
> >> address space. Or we could stay small, but make it possible to
> >> programmatically increase the granularity from within R.
> >>
> >> For ease of reference, here are the relevant sections of code:
> >>
> >> https://github.com/wch/r-source/blob/master/src/include/Defn.h#L217
> >> (ripley last authored on Jan 26, 2000 / pd last authored on May 8, 1999)
> >> 217 #ifndef R_NSIZE
> >> 218 #define R_NSIZE 350000L
> >> 219 #endif
> >> 220 #ifndef R_VSIZE
> >> 221 #define R_VSIZE 6291456L
> >> 222 #endif
> >>
> >> https://github.com/wch/r-source/blob/master/src/main/startup.c#L169
> >> (ripley last authored on Jun 9, 2004)
> >> 157 Rp->vsize = R_VSIZE;
> >> 158 Rp->nsize = R_NSIZE;
> >> 166 #define Max_Nsize 50000000 /* about 1.4Gb 32-bit, 2.8Gb 64-bit */
> >> 167 #define Max_Vsize R_SIZE_T_MAX /* unlimited */
> >> 169 #define Min_Nsize 220000
> >> 170 #define Min_Vsize (1*Mega)
> >>
> >> https://github.com/wch/r-source/blob/master/src/main/memory.c#L335
> >> (luke last authored on Nov 1, 2000)
> >> #ifdef SMALL_MEMORY
> >> 336 /* On machines with only 32M of memory (or on a classic Mac OS port)
> >> 337 it might be a good idea to use settings like these that are more
> >> 338 aggressive at keeping memory usage down. */
> >> 339 static double R_NGrowIncrFrac = 0.0, R_NShrinkIncrFrac = 0.2;
> >> 340 static int R_NGrowIncrMin = 50000, R_NShrinkIncrMin = 0;
> >> 341 static double R_VGrowIncrFrac = 0.0, R_VShrinkIncrFrac = 0.2;
> >> 342 static int R_VGrowIncrMin = 100000, R_VShrinkIncrMin = 0;
> >> 343#else
> >> 344 static double R_NGrowIncrFrac = 0.05, R_NShrinkIncrFrac = 0.2;
> >> 345 static int R_NGrowIncrMin = 40000, R_NShrinkIncrMin = 0;
> >> 346 static double R_VGrowIncrFrac = 0.05, R_VShrinkIncrFrac = 0.2;
> >> 347 static int R_VGrowIncrMin = 80000, R_VShrinkIncrMin = 0;
> >> 348#endif
> >>
> >> static void AdjustHeapSize(R_size_t size_needed)
> >> {
> >> R_size_t R_MinNFree = (R_size_t)(orig_R_NSize * R_MinFreeFrac);
> >> R_size_t R_MinVFree = (R_size_t)(orig_R_VSize * R_MinFreeFrac);
> >> R_size_t NNeeded = R_NodesInUse + R_MinNFree;
> >> R_size_t VNeeded = R_SmallVallocSize + R_LargeVallocSize +
> >> size_needed + R_MinVFree;
> >> double node_occup = ((double) NNeeded) / R_NSize;
> >> double vect_occup = ((double) VNeeded) / R_VSize;
> >>
> >> if (node_occup > R_NGrowFrac) {
> >> R_size_t change = (R_size_t)(R_NGrowIncrMin + R_NGrowIncrFrac
> >> * R_NSize);
> >> if (R_MaxNSize >= R_NSize + change)
> >> R_NSize += change;
> >> }
> >> else if (node_occup < R_NShrinkFrac) {
> >> R_NSize -= (R_NShrinkIncrMin + R_NShrinkIncrFrac * R_NSize);
> >> if (R_NSize < NNeeded)
> >> R_NSize = (NNeeded < R_MaxNSize) ? NNeeded: R_MaxNSize;
> >> if (R_NSize < orig_R_NSize)
> >> R_NSize = orig_R_NSize;
> >> }
> >>
> >> if (vect_occup > 1.0 && VNeeded < R_MaxVSize)
> >> R_VSize = VNeeded;
> >> if (vect_occup > R_VGrowFrac) {
> >> R_size_t change = (R_size_t)(R_VGrowIncrMin + R_VGrowIncrFrac
> >> * R_VSize);
> >> if (R_MaxVSize - R_VSize >= change)
> >> R_VSize += change;
> >> }
> >> else if (vect_occup < R_VShrinkFrac) {
> >> R_VSize -= R_VShrinkIncrMin + R_VShrinkIncrFrac * R_VSize;
> >> if (R_VSize < VNeeded)
> >> R_VSize = VNeeded;
> >> if (R_VSize < orig_R_VSize)
> >> R_VSize = orig_R_VSize;
> >> }
> >>
> >> DEBUG_ADJUST_HEAP_PRINT(node_occup, vect_occup);
> >> }
> >>
> Rp-> nsize is overridden at startup by environment variable R_NSIZE if
> >> Min_Nsize <= $R_NSIZE <= Max_Nsize. Rp->vsize is overridden at
> >> startup by environment variable R_VSIZE if Min_Vsize <= $R_VSIZE <=
> >> Max_Vsize. These are then used to set the global variables R_Nsize
> >> and R_Vsize with R_SetMaxVSize(Rp->max_vsize).
> >>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list