[Rd] modifying large R objects in place

Peter Dalgaard p.dalgaard at biostat.ku.dk
Fri Sep 28 17:46:25 CEST 2007


Duncan Murdoch wrote:
> On 9/28/2007 7:45 AM, Petr Savicky wrote:
>   
>> On Fri, Sep 28, 2007 at 12:39:30AM +0200, Peter Dalgaard wrote:
>>     
>   ...
>   
>>> Longer-term, I still have some hope for better reference counting, but 
>>> the semantics of environments make it really ugly -- an environment can 
>>> contain an object that contains the environment, a simple example being 
>>>
>>> f <- function()
>>>    g <- function() 0
>>> f()
>>>
>>> At the end of f(), we should decide whether to destroy f's evaluation 
>>> environment. In the present example, what we need to be able to see is 
>>> that this would remove all refences to g and that the reference from g 
>>> to f can therefore be ignored.  Complete logic for sorting this out is 
>>> basically equivalent to a new garbage collector, and one can suspect 
>>> that applying the logic upon every function return is going to be 
>>> terribly inefficient. However, partial heuristics might apply.
>>>       
>> I have to say that I do not understand the example very much.
>> What is the input and output of f? Is g inside only defined or
>> also used?
>>     
>
> f has no input; it's output is the function g, whose environment is the 
> evaluation environment of f.  g is never used, but it is returned as the 
> value of f.  Thus we have the loop:
>
> g refers to the environment.
> the environment contains g.
>
> Even though the result of f() was never saved, two things (the 
> environment and g) got created and each would have non-zero reference 
> count.
>
> In a more complicated situation you might want to save the result of the 
> function and then modify it.  But because of the loop above, you would 
> always think there's another reference to the object, so every in-place 
> modification would require a copy first.
>
>   
Thanks Duncan. It was way past my bedtime when I wrote that...

I had actually missed the point about the return value,  but the point 
remains even if you let f return something other than g: You get a 
situation where the two objects both have a refcount of 1, so by 
standard refcounting semantics neither can be removed even though 
neither object is reachable.

Put differently, standard refcounting assumes that references between 
objects of the language form a directed acyclic graph, but when 
environments are involved, there can be cycles in R-like languages.

    -p

> Duncan Murdoch
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>   


-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907



More information about the R-devel mailing list