[Rd] Small changes to big objects (1)
jmc at r-project.org
Mon Jan 7 20:20:01 CET 2013
On 1/7/13 9:59 AM, Douglas Bates wrote:
> Is there a difference in the copying behavior of
> x at little <- other
> x at little <- other
Not in the direction you were hoping, as far as I can tell.
Nested replacement expressions in R and S are unraveled and done as
repeated simple replacements. So either way you end up with, in effect
x at little <- something
If x has >1 reference, as it tends to, EnsureLocal() will call duplicate().
I think the only difference is that your second form gets you to
duplicate the little vector twice. ;-)
> I was using the second form in (yet another!) modification of the internal
> representation of mixed-effects models in the lme4 package in the hopes
> that it would not trigger copying of the entire object. The object
> representing the model is quite large but the changes during iterations are
> to small vectors representing parameters and coefficients.
> On Thu, Jan 3, 2013 at 1:08 PM, John Chambers <jmc at r-project.org> wrote:
>> Martin Morgan commented in email to me that a change to any slot of an
>> object that has other, large slot(s) does substantial computation,
>> presumably from copying the whole object. Is there anything to be done?
>> There are in fact two possible changes, one automatic but only partial,
>> the other requiring some action on the programmer's part. Herewith the
>> first; I'll discuss the second in a later email.
>> Some context: The notion is that our object has some big data and some
>> additional smaller things. We need to change the small things but would
>> rather not copy the big things all the time. (With long vectors, this
>> becomes even more relevant.)
>> There are three likely scenarios: slots, attributes and named list
>> components. Suppose our object has "little" and "BIG" encoded in one of
>> The three relevant computations are:
>> x at little <- other
>> attr(x, "little") <- other
>> x$little <- other
>> It turns out that these are all similar in behavior with one important
>> exception--fixing that is the automatic change.
>> I need to review what R does here. All these are replacement functions,
>> `@<-`, `attr<-`, `$<-`. The evaluator checks before calling any
>> replacement whether the object needs to be duplicated (in a routine
>> EnsureLocal()). It does that by examining a special field that holds the
>> reference status of the object.
>> Some languages, such as Python (and S) keep reference counts for each
>> object, de-allocating the object when the reference count drops back to
>> zero. R uses a different strategy. Its NAMED() field is 0, 1 or 2
>> according to whether the object has been assigned never, once or more than
>> once. The field is not a reference count and is not decremented--relevant
>> for this issue. Objects are de-allocated only when garbage collection
>> occurs and the object does not appear in any current frame or other context.
>> (I did not write any of this code, so apologies if I'm misrepresenting it.)
>> When any of these replacement operations first occurs for a particular
>> object in a particular function call, it's very likely that the reference
>> status will be 2 and EnsureLocal will duplicate it--all of it. Regardless
>> of which of the three forms is used.
>> Here the non-level-playing-field aspect comes in. `@<-` is a normal R
>> function (a "closure") but the other two are primitives in the main code
>> for R. Primitives have no frame in which arguments are stored. As a
>> result the new version of x is normally stored with status 1.
>> If one does a second replacement in the same call (in a loop, e.g.) that
>> should not normally copy again. But the result of `@<-` will be an object
>> from its frame and will have status 2 when saved, forcing a copy each time.
>> So the change, naturally, is that R 3.0.0 will have a primitive
>> implementation of `@<`. This has been implemented in r-devel (rev. 61544).
>> Please try it out _before_ we issue that version, especially if you own a
>> package that does things related to this question.
>> PS: Some may have noticed that I didn't mention a fourth approach: fields
>> in a reference class object. The assumption was that we wanted classical,
>> functional behavior here. Reference classes don't have the copy problem
>> but don't behave functionally either. But that is in fact the direction
>> for the other approach. I'll discuss that later, when the corresponding
>> code is available.
>> R-devel at r-project.org mailing list
> [[alternative HTML version deleted]]
> R-devel at r-project.org mailing list
More information about the R-devel