[Rd] default min-v/nsize parameters
Martin Maechler
maechler at stat.math.ethz.ch
Tue Jan 20 10:42:27 CET 2015
>>>>> Peter Haverty <haverty.peter at gene.com>
>>>>> on Mon, 19 Jan 2015 08:50:08 -0800 writes:
> Hi All, This is a very important issue. It would be very
> sad to leave most users unaware of a free speedup of this
> size. These options don't appear in the R --help
> output. They really should be added there.
Indeed, I've found that myself and had added them there about
24 hours ago.
((I think they were accidentally dropped a while ago))
> if the garbage collector is working very hard, might it
> emit a note about better setting for these variables?
> It's not really my place to comment on design philosophy,
> but if there is a configure option for small memory
> machines I would assume that would be sufficient for the
> folks that are not on fairly current hardware.
There's quite a few more issues with this,
notably how the growth *steps* are done.
That has been somewhat experimental and for that reason is
_currently_ quite configurable via R_GC_* environment variables,
see the code in src/main/memory.c
This is currently discussed "privately" within the R core.
I'm somewhat confident that R 3.2.0 in April will have changes.
And -- coming back to the beginning -- at least the "R-devel" version now shows
R --help | grep -e min-.size
--min-nsize=N Set min number of fixed size obj's ("cons cells") to N
--min-vsize=N Set vector heap minimum to N bytes; '4M' = 4 MegaB
--
Martin Maechler, ETH Zurich
> On Sat, Jan 17, 2015 at 11:40 PM, Nathan Kurz <nate at verse.com> wrote:
>> On Thu, Jan 15, 2015 at 3:55 PM, Michael Lawrence
>> <lawrence.michael at gene.com> wrote:
>> > Just wanted to start a discussion on whether R could ship with more
>> > appropriate GC parameters.
>>
>> I've been doing a number of similar measurements, and have come to the
>> same conclusion. R is currently very conservative about memory usage,
>> and this leads to unnecessarily poor performance on certain problems.
>> Changing the defaults to sizes that are more appropriate for modern
>> machines can often produce a 2x speedup.
>>
>> On Sat, Jan 17, 2015 at 8:39 AM, <luke-tierney at uiowa.edu> wrote:
>> > Martin Morgan discussed this a year or so ago and as I recall bumped
>> > up these values to the current defaults. I don't recall details about
>> > why we didn't go higher -- maybe Martin does.
>>
>> I just checked, and it doesn't seem that any of the relevant values
>> have been increased in the last ten years. Do you have a link to the
>> discussion you recall so we can see why the changes weren't made?
>>
>> > I suspect the main concern would be with small memory machines in
>> student labs
>> > and less developed countries.
>>
>> While a reasonable concern, I'm doubtful there are many machines for
>> which the current numbers are optimal. The current minimum size
>> increases for node and vector heaps are 40KB and 80KB respectively.
>> This grows as the heap grows (min + .05 * heap), but still means that
>> we do many more expensive garbage collections at while growing than we
>> need to. Paradoxically, the SMALL_MEMORY compile option (which is
>> suggestd for computers with up to 32MB of RAM) has slightly larger at
>> 50KB and 100KB.
>>
>> I think we'd get significant benefit for most users by being less
>> conservative about memory consumption. The exact sizes should be
>> discussed, but with RAM costing about $10/GB it doesn't seem
>> unreasonable to assume most machines running R have multiple GB
>> installed, and those that don't will quite likely be running an OS
>> that needs a custom compiled binary anyway.
>>
>> I could be way off, but my suggestion might be a 10MB start with 1MB
>> minimum increments for SMALL_MEMORY, 100MB start with 10MB increments
>> for NORMAL_MEMORY, and 1GB start with 100MB increments for
>> LARGE_MEMORY might be a reasonable spread.
>>
>> Or one could go even larger, noting that on most systems,
>> overcommitted memory is not a problem until it is used. Until we
>> write to it, it doesn't actually use physical RAM, just virtual
>> address space. Or we could stay small, but make it possible to
>> programmatically increase the granularity from within R.
>>
>> For ease of reference, here are the relevant sections of code:
>>
>> https://github.com/wch/r-source/blob/master/src/include/Defn.h#L217
>> (ripley last authored on Jan 26, 2000 / pd last authored on May 8, 1999)
>> 217 #ifndef R_NSIZE
>> 218 #define R_NSIZE 350000L
>> 219 #endif
>> 220 #ifndef R_VSIZE
>> 221 #define R_VSIZE 6291456L
>> 222 #endif
>>
>> https://github.com/wch/r-source/blob/master/src/main/startup.c#L169
>> (ripley last authored on Jun 9, 2004)
>> 157 Rp->vsize = R_VSIZE;
>> 158 Rp->nsize = R_NSIZE;
>> 166 #define Max_Nsize 50000000 /* about 1.4Gb 32-bit, 2.8Gb 64-bit */
>> 167 #define Max_Vsize R_SIZE_T_MAX /* unlimited */
>> 169 #define Min_Nsize 220000
>> 170 #define Min_Vsize (1*Mega)
>>
>> https://github.com/wch/r-source/blob/master/src/main/memory.c#L335
>> (luke last authored on Nov 1, 2000)
>> #ifdef SMALL_MEMORY
>> 336 /* On machines with only 32M of memory (or on a classic Mac OS port)
>> 337 it might be a good idea to use settings like these that are more
>> 338 aggressive at keeping memory usage down. */
>> 339 static double R_NGrowIncrFrac = 0.0, R_NShrinkIncrFrac = 0.2;
>> 340 static int R_NGrowIncrMin = 50000, R_NShrinkIncrMin = 0;
>> 341 static double R_VGrowIncrFrac = 0.0, R_VShrinkIncrFrac = 0.2;
>> 342 static int R_VGrowIncrMin = 100000, R_VShrinkIncrMin = 0;
>> 343#else
>> 344 static double R_NGrowIncrFrac = 0.05, R_NShrinkIncrFrac = 0.2;
>> 345 static int R_NGrowIncrMin = 40000, R_NShrinkIncrMin = 0;
>> 346 static double R_VGrowIncrFrac = 0.05, R_VShrinkIncrFrac = 0.2;
>> 347 static int R_VGrowIncrMin = 80000, R_VShrinkIncrMin = 0;
>> 348#endif
>>
>> static void AdjustHeapSize(R_size_t size_needed)
>> {
>> R_size_t R_MinNFree = (R_size_t)(orig_R_NSize * R_MinFreeFrac);
>> R_size_t R_MinVFree = (R_size_t)(orig_R_VSize * R_MinFreeFrac);
>> R_size_t NNeeded = R_NodesInUse + R_MinNFree;
>> R_size_t VNeeded = R_SmallVallocSize + R_LargeVallocSize +
>> size_needed + R_MinVFree;
>> double node_occup = ((double) NNeeded) / R_NSize;
>> double vect_occup = ((double) VNeeded) / R_VSize;
>>
>> if (node_occup > R_NGrowFrac) {
>> R_size_t change = (R_size_t)(R_NGrowIncrMin + R_NGrowIncrFrac
>> * R_NSize);
>> if (R_MaxNSize >= R_NSize + change)
>> R_NSize += change;
>> }
>> else if (node_occup < R_NShrinkFrac) {
>> R_NSize -= (R_NShrinkIncrMin + R_NShrinkIncrFrac * R_NSize);
>> if (R_NSize < NNeeded)
>> R_NSize = (NNeeded < R_MaxNSize) ? NNeeded: R_MaxNSize;
>> if (R_NSize < orig_R_NSize)
>> R_NSize = orig_R_NSize;
>> }
>>
>> if (vect_occup > 1.0 && VNeeded < R_MaxVSize)
>> R_VSize = VNeeded;
>> if (vect_occup > R_VGrowFrac) {
>> R_size_t change = (R_size_t)(R_VGrowIncrMin + R_VGrowIncrFrac
>> * R_VSize);
>> if (R_MaxVSize - R_VSize >= change)
>> R_VSize += change;
>> }
>> else if (vect_occup < R_VShrinkFrac) {
>> R_VSize -= R_VShrinkIncrMin + R_VShrinkIncrFrac * R_VSize;
>> if (R_VSize < VNeeded)
>> R_VSize = VNeeded;
>> if (R_VSize < orig_R_VSize)
>> R_VSize = orig_R_VSize;
>> }
>>
>> DEBUG_ADJUST_HEAP_PRINT(node_occup, vect_occup);
>> }
>>
Rp-> nsize is overridden at startup by environment variable R_NSIZE if
>> Min_Nsize <= $R_NSIZE <= Max_Nsize. Rp->vsize is overridden at
>> startup by environment variable R_VSIZE if Min_Vsize <= $R_VSIZE <=
>> Max_Vsize. These are then used to set the global variables R_Nsize
>> and R_Vsize with R_SetMaxVSize(Rp->max_vsize).
>>
More information about the R-devel
mailing list