[Rd] Moderating consequences of garbage collection when in C
dhinds at sonic.net
dhinds at sonic.net
Mon Nov 14 22:12:32 CET 2011
Martin Morgan <mtmorgan at fhcrc.org> wrote:
> On 11/14/2011 11:47 AM, dhinds at sonic.net wrote:
> > dhinds at sonic.net wrote:
> >> Martin Morgan<mtmorgan at fhcrc.org> wrote:
> >
> > I had done some google searches on this issue, since it seemed like it
> > should not be too uncommon, but the only other hit I could come up
> > with was a thread from 2006:
> >
> > https://stat.ethz.ch/pipermail/r-devel/2006-November/043446.html
> >
> > In any case, one issue with your suggested workaround is that it
> > requires knowing how much additional storage is needed, which may be
> > an expensive operation to determine. I've just tried implementing a
> > different approach, which is to define two new functions to either
> > disable or enable GC. The function to disable GC first invokes
> > R_gc_full() to shrink the heap as much as possible, then sets a flag.
> > Then in R_gc_internal(), I first check that flag, and if it is set, I
> > call AdjustHeapSize(size_needed) and exit immediately.
> I think this is a better approach; mine seriously understated the
> complexity of figuring out required size.
> > These calls could be used to bracket any code section that expects to
> > make lots of calls to R's memory allocator. The down side is that
> > this approach requires that all paths out of such a code section
> > (including error handling) need to take care to unset the GC-disabled
> > flag. I think I would want to hear from someone on the R team about
> > whether they think this is a good idea.
> >
> Another place where this comes up is during package load, especially for
> packages with many S4 instances.
Do you know if this is all happening inside a C function that could
handle disabling and enabling GC? Or would it require doing this at
the R level? For testing, I am turning GC on and off at the R level
but I am thinking about where we would need to check for failures to
re-enable GC. I suppose one approach would be to provide an R wrapper
that would evaluate an expression with GC disabled using tryCatch to
guarantee that it would exit with GC enabled.
> > system.time(as.character(1:10000000))
> user system elapsed
> 61.908 0.297 62.303
I get 6 seconds for this with GC disabled.
> There's a hierarchy of CHARSXP / STRSXP, so maybe that could be
> exploited in the mark phase?
I haven't explored whether GC could be made smarter so that this isn't
as big of a hit. I don't really understand the GC process.
-- Dave
More information about the R-devel
mailing list