[Rd] Objects not gc'ed due to caching (?) in R's S3 dispatch mechanism

Iñaki Úcar i.ucar86 at gmail.com
Tue Mar 27 11:53:58 CEST 2018


2018-03-27 11:11 GMT+02:00 Tomas Kalibera <tomas.kalibera at gmail.com>:
> On 03/27/2018 09:51 AM, Iñaki Úcar wrote:
>>
>> 2018-03-27 6:02 GMT+02:00  <luke-tierney at uiowa.edu>:
>>>
>>> This has nothing to do with printing or dispatch per se. It is the
>>> result of an internal register (R_ReturnedValue) being protected. It
>>> gets rewritten whenever there is a jump, e.g. by an explicit return
>>> call. So a simplified example is
>>>
>>> new_foo <- function() {
>>>    e <- new.env()
>>>      reg.finalizer(e, function(e) message("Finalizer called"))
>>>        e
>>>        }
>>>
>>> bar <- function(x) return(x)
>>>
>>> bar(new_foo())
>>> gc() # still in .Last.value
>>> gc() # nothing
>>>
>>> UseMethod essentially does a return call so you see the effect there.
>>
>> Understood. Thanks for the explanation, Luke.
>>
>>> The R_ReturnedValue register could probably be safely cleared in more
>>> places but it isn't clear exactly where. As things stand it will be
>>> cleared on the next use of a non-local transfer of control, and those
>>> happen frequently enough that I'm not convinced this is worth
>>> addressing, at least not at this point in the release cycle.
>>
>> I barely know the R internals, and I'm sure there's a good reason
>> behind this change (R 3.2.3 does not show this behaviour), but IMHO
>> it's, at the very least, confusing. When .Last.value is cleared, that
>> object loses the last reference, and I'd expect it to be eligible for
>> gc.
>>
>> In my case, I was using an object that internally generates a bunch of
>> data. I discovered this because I was benchmarking the execution, and
>> I was running out of memory because the memory wasn't been freed as it
>> was supposed to. So I spent half of the day on this because I thought
>> I had a memory leak. :-\ (Not blaming anyone here, of course; just
>> making a case to show that this may be worth addressing at some
>> point). :-)
>
> From the perspective of the R user/programmer/package developer, please do
> not make any assumptions on when finalizers will be run, only that they
> indeed won't be run when the object is still alive. Similarly, it is not
> good to make any assumptions that "gc()" will actually run a collection (and
> a particular type of collection, that it will be immediately, etc). Such
> guarantees would too much restrict the design space and potential
> optimizations on the R internals side - and for this reason are typically
> not given in other managed languages, either. I've seen R examples where
> most time had been wasted tracing live objects because explicit "gc()" had
> been run in a tight loop. Note in Java for instance, an explicit call to
> gc() had been eventually turned into a hint only.
>
> Once you start debugging when objects are collected, you are debugging R
> internals - and surprises/changes between svn versions/etc should be
> expected as well as changes in behavior caused very indirectly by code
> changes somewhere else. I work on R internals and spend most of my time
> debugging - that is unfortunately normal when you work on a language
> runtime. Indeed, the runtime should try not to keep references to objects
> for too long, but it remains to be seen whether and for what cost this could
> be fixed with R_ReturnedValue.

To be precise, I was not debugging *when* objects were collected, I
was debugging *whether* objects were collected. And for that, I
necessarily need some hint about the *when*.

But I think that's another discussion. My point is that, as an R user
and package developer, I expect consistency, and currently

new_foo <- function() {
  e <- new.env()
  reg.finalizer(e, function(e) message("Finalizer called"))
  e
}

bar <- function(x) return(x)

bar(new_foo())
gc() # still in .Last.value
gc() # nothing

behaves differently than

new_foo <- function() {
  e <- new.env()
  reg.finalizer(e, function(e) message("Finalizer called"))
  e
}

bar <- function(x) x

bar(new_foo())
gc() # still in .Last.value
gc() # Finalizer called!

And such a difference is not explained (AFAIK) in the documentation.
At least the help page for 'return' does not make me think that I
should not expect exactly the same behaviour if I write (or not) an
explicit 'return'.

Regards,
Iñaki

>
> Best
> Tomas
>



More information about the R-devel mailing list