[Rd] Copy on assignment to large field of reference class

John Chambers jmc at r-project.org
Sun May 19 23:59:33 CEST 2013


This is a useful observation.  To talk about it, though, we need to re-express it in terms that make sense for R; there are too many misconceptions otherwise.

The basic observation is this:  When simple subset or element replacement is done in a loop, normally the object is only copied on the first time through the loop.  This is true whether using local assignment, <-, or global assignment, <<-.

However, if global assignment is done in a method to replace in a field, the object is copied every time.  For long loops this makes for substantial overhead.  Very relevant observation.

What's going on?

The non-copying depends on the fact that `[<-` is a primitive function.

When a field is declared with a class ("vector" in the example), its assignment is done by an R function that checks the validity (via what's called an "active binding" in R).  That causes the extra copy on each assignment.  (To be honest, I don't totally understand why, but I have no intention of messing with the active binding code.)

What to do about it?

There are two solutions; either take the attitude that field assignment is basically inefficient and don't do it in a loop, as in method modb2.

Or don't declare a class for the field, in which case no active binding is used.  Check this out by changing the class definition to setRefClass("A", fields="b").

I prefer the first solution since it retains the validity check on the field.

John


PS: A few comments.
 - it makes no sense to expect _greater_ efficiency than for a simple assignment.  The object in a$b is NOT a reference object so its manipulation obeys R's normal rules.
 - all this only applies to replacement functions that are primitives.  Otherwise you're stuck with copies each time.
 - Please don't use the term "call by value" for R; that's not how R's evaluation works and has nothing to do with when duplication takes place.  That topic is not for the faint of heart, but basically when R knows that there is only one reference to an object, it doesn't copy.  But in practice this is mainly when a primitive replacement function is used.


On May 18, 2013, at 2:50 PM, Giles Percy <giles.percy at gmail.com> wrote:

> Dear all
> 
> I am trying to find the best way to handle large fields in reference
> classes.
> 
> As the code below shows assignment via <<- causes many copies to be made if
> the subsetting is extensive (in modb1). This can cause R to run out of
> memory. Creating a local copy and using the optimisation in <- is the best
> solution I have found so far (in modb2) - but it is not really much better
> than ordinary functions using call by value and then reassigning.
> 
> Is there a reason why optimisation does not occur for <<- ? Or is their a
> better solution for reference classes?
> 
> Regards
> Giles
> 
> A <- setRefClass("A", fields=list(b="vector"))
> 
> A$methods(
>  initialize=function() {
> b <<- 1:10000
> },
>  modb1=function() {
> # simple subsetting for illustration
> for(i in 2:length(b)) b[i] <<- b[i-1] + 1
> },
>  modb2=function() {
> bb <- b
> for(i in 2:length(b)) bb[i] <- bb[i-1] + 1
> b <<- bb
> }
> )
> a <- new("A")
> tracemem(a$b)
> 
> a$modb1()
> 
> a$modb2()
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list