[Rd] Small changes to big objects (1)

John Chambers jmc at r-project.org
Fri Jan 4 18:04:51 CET 2013


One point that came up in the CRAN checks, that I should have made explicit:

The new version of "@<-" has to move from the "methods" package to "base".

Therefore you should not (and can not) explicitly import it from 
"methods"--that will fail in the import phase of installation.

John

On 1/3/13 11:08 AM, John Chambers wrote:
> Martin Morgan commented in email to me that a change to any slot of an
> object that has other, large slot(s) does substantial computation,
> presumably from copying the whole object.  Is there anything to be done?
>
> There are in fact two possible changes, one automatic but only partial,
> the other requiring some action on the programmer's part.  Herewith the
> first; I'll discuss the second in a later email.
>
> Some context:  The notion is that our object has some big data and some
> additional smaller things.  We need to change the small things but would
> rather not copy the big things all the time.  (With long vectors, this
> becomes even more relevant.)
>
> There are three likely scenarios: slots, attributes and named list
> components.  Suppose our object has "little" and "BIG" encoded in one of
> these.
>
> The three relevant computations are:
>
> x at little <- other
> attr(x, "little") <- other
> x$little <- other
>
> It turns out that these are all similar in behavior with one important
> exception--fixing that is the automatic change.
>
> I need to review what R does here. All these are replacement functions,
> `@<-`, `attr<-`, `$<-`.  The evaluator checks before calling any
> replacement whether the object needs to be duplicated (in a routine
> EnsureLocal()).  It does that by examining a special field that holds
> the reference status of the object.
>
> Some languages, such as Python (and S) keep reference counts for each
> object, de-allocating the object when the reference count drops back to
> zero.  R uses a different strategy. Its NAMED() field is 0, 1 or 2
> according to whether the object has been assigned never, once or more
> than once.  The field is not a reference count and is not
> decremented--relevant for this issue.  Objects are de-allocated only
> when garbage collection occurs and the object does not appear in any
> current frame or other context.
> (I did not write any of this code, so apologies if I'm misrepresenting it.)
>
> When any of these replacement operations first occurs for a particular
> object in a particular function call, it's very likely that the
> reference status will be 2 and EnsureLocal will duplicate it--all of it.
> Regardless of which of the three forms is used.
>
> Here the non-level-playing-field aspect comes in.  `@<-` is a normal R
> function (a "closure") but the other two are primitives in the main code
> for R.  Primitives have no frame in which arguments are stored.  As a
> result the new version of x is normally stored with status 1.
>
> If one does a second replacement in the same call (in a loop, e.g.) that
> should not normally copy again.  But the result of `@<-` will be an
> object from its frame and will have status 2 when saved, forcing a copy
> each time.
>
> So the change, naturally, is that R 3.0.0 will have a primitive
> implementation of `@<`.  This has been implemented in r-devel (rev. 61544).
>
> Please try it out _before_ we issue that version, especially if you own
> a package that does things related to this question.
>
> John
>
> PS:  Some may have noticed that I didn't mention a fourth approach:
> fields in a reference class object.  The assumption was that we wanted
> classical, functional behavior here.  Reference classes don't have the
> copy problem but don't behave functionally either.  But that is in fact
> the direction for the other approach.  I'll discuss that later, when the
> corresponding code is available.
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list