[Rd] Small changes to big objects (2): Local Reference Classes
jmc at r-project.org
Sat Jan 5 19:55:51 CET 2013
Back to the scenario in my email of Jan. 3: We have objects with some
large (or very large) components and some other components as well. We
need to modify the smaller stuff but are not changing the big data. How
can we avoid copying the big data?
(A use case might be some modeling of large data where we want to save
various versions, all including the same original data but differing in
some stored parameters, estimates, etc.)
A new kind of class, "local reference classes" has been added to r-devel
(rev. 61562). It's the idea that using these classes to represent data
can avoid copying that's not needed, while retaining the standard R
functional semantics, or close to that. For a quick look, see
Here is the idea.
We imagine that our object has components/slots/attributes/fields
"BigData", say, and "twiddle". With normal R evaluation, replacing
"twiddle" in the object will cause internal duplication of the whole
thing, in the very likely case that we pass some object, myX say as
argument x to a function.
As soon as the evaluator sees a replacement function, "@<-", "$<-" or
"attr<-" for an ordinary object, the EnsureLocal routine calls
duplicate() if the object has more than one reference, as it will in
this scenario. And BigData gets copied. I think it's important to
understand that this follows from the "replacement function" concept in
S and R: A replacement function takes an object from the frame, does
whatever it does, and returns a replacement for this object. The
evaluator doesn't know what the replacement function does, so the
EnsureLocal strategy is inevitable.
There is one trapdoor, however. duplicate() does essentially nothing
for data types that are references, most importantly for environments.
That's the basis for reference classes.
But a reference class is not exactly what we want here. Our different
models share the BigData but should not share the same other fields. If
I twiddle parameters in one model, it better not change another model.
So it's R's standard "functional" semantics we want.
In fact, R is not strictly a functional language. Rather it has the
idea of "local references": ordinary assignments change the references
in the local frame but have no external effect.
Local reference classes implement essentially this using reference class
fields. Specifically, calling a method $ensureLocal() on an object,
directly or via replacing a field, causes a *shallow* copy of the object
to be created and remembered locally. Subsequent replacements have no
effect on the object passed in to the function.
The implementation is fairly simple, but the programmer does have to be
aware of what's happening, to some extent. Please look it over and play
with it if it seems interesting.
More information about the R-devel