[Rd] Small changes to big objects (1)

John Chambers jmc at r-project.org
Mon Jan 7 20:20:01 CET 2013

On 1/7/13 9:59 AM, Douglas Bates wrote:
> Is there a difference in the copying behavior of
> x at little <- other
> and
> x at little[] <- other

Not in the direction you were hoping, as far as I can tell.

Nested replacement expressions in R and S are unraveled and done as 
repeated simple replacements.  So either way you end up with, in effect
   x at little <- something

If x has >1 reference, as it tends to, EnsureLocal() will call duplicate().

I think the only difference is that your second form gets you to 
duplicate the little vector twice. ;-)

> I was using the second form in (yet another!) modification of the internal
> representation of mixed-effects models in the lme4 package in the hopes
> that it would not trigger copying of the entire object.  The object
> representing the model is quite large but the changes during iterations are
> to small vectors representing parameters and coefficients.
> On Thu, Jan 3, 2013 at 1:08 PM, John Chambers <jmc at r-project.org> wrote:
>> Martin Morgan commented in email to me that a change to any slot of an
>> object that has other, large slot(s) does substantial computation,
>> presumably from copying the whole object.  Is there anything to be done?
>> There are in fact two possible changes, one automatic but only partial,
>> the other requiring some action on the programmer's part.  Herewith the
>> first; I'll discuss the second in a later email.
>> Some context:  The notion is that our object has some big data and some
>> additional smaller things.  We need to change the small things but would
>> rather not copy the big things all the time.  (With long vectors, this
>> becomes even more relevant.)
>> There are three likely scenarios: slots, attributes and named list
>> components.  Suppose our object has "little" and "BIG" encoded in one of
>> these.
>> The three relevant computations are:
>> x at little <- other
>> attr(x, "little") <- other
>> x$little <- other
>> It turns out that these are all similar in behavior with one important
>> exception--fixing that is the automatic change.
>> I need to review what R does here. All these are replacement functions,
>> `@<-`, `attr<-`, `$<-`.  The evaluator checks before calling any
>> replacement whether the object needs to be duplicated (in a routine
>> EnsureLocal()).  It does that by examining a special field that holds the
>> reference status of the object.
>> Some languages, such as Python (and S) keep reference counts for each
>> object, de-allocating the object when the reference count drops back to
>> zero.  R uses a different strategy. Its NAMED() field is 0, 1 or 2
>> according to whether the object has been assigned never, once or more than
>> once.  The field is not a reference count and is not decremented--relevant
>> for this issue.  Objects are de-allocated only when garbage collection
>> occurs and the object does not appear in any current frame or other context.
>> (I did not write any of this code, so apologies if I'm misrepresenting it.)
>> When any of these replacement operations first occurs for a particular
>> object in a particular function call, it's very likely that the reference
>> status will be 2 and EnsureLocal will duplicate it--all of it. Regardless
>> of which of the three forms is used.
>> Here the non-level-playing-field aspect comes in.  `@<-` is a normal R
>> function (a "closure") but the other two are primitives in the main code
>> for R.  Primitives have no frame in which arguments are stored.  As a
>> result the new version of x is normally stored with status 1.
>> If one does a second replacement in the same call (in a loop, e.g.) that
>> should not normally copy again.  But the result of `@<-` will be an object
>> from its frame and will have status 2 when saved, forcing a copy each time.
>> So the change, naturally, is that R 3.0.0 will have a primitive
>> implementation of `@<`.  This has been implemented in r-devel (rev. 61544).
>> Please try it out _before_ we issue that version, especially if you own a
>> package that does things related to this question.
>> John
>> PS:  Some may have noticed that I didn't mention a fourth approach: fields
>> in a reference class object.  The assumption was that we wanted classical,
>> functional behavior here.  Reference classes don't have the copy problem
>> but don't behave functionally either.  But that is in fact the direction
>> for the other approach.  I'll discuss that later, when the corresponding
>> code is available.
>> ______________________________**________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-devel<https://stat.ethz.ch/mailman/listinfo/r-devel>
> 	[[alternative HTML version deleted]]
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

More information about the R-devel mailing list