[Rd] reference class internals

Martin Morgan mtmorgan at fhcrc.org
Fri Jan 10 05:27:09 CET 2014


On 01/09/2014 07:53 PM, Norm Matloff wrote:
>
> Thanks, Hadley and Simon.
>
> The reason I asked today was that when reference classes first came out,
> it had appeared to me that there is no peformance advantage to using
> reference classes, that it was mainly a style issue (encapsulation,
> etc.).  Unless I'm missing something, both of you have confirmed my
> original impression, correct?

We've used reference classes for performance benefit. E.g., updating a single 
(e.g., small) field in an S4 object triggers an entire copy of the object, 
whereas for a reference class the fields can be updated independently. This is 
especially true inside function (e.g., method) calls (e.g., slot access), where 
the object is marked to be duplicated.


> > a = setClass("A", representation(x="numeric"))(x=1:5)
> .Internal(inspect(a))
@5237508 25 S4SXP g0c0 [OBJ,NAM(2),S4,gp=0x10,ATT]
ATTRIB:
   @5237460 02 LISTSXP g0c0 []
     TAG: @12ea3a0 01 SYMSXP g0c0 [NAM(2)] "x"
     @5225db8 13 INTSXP g0c3 [NAM(2)] (len=5, tl=0) 1,2,3,4,5
     TAG: @1284b08 01 SYMSXP g0c0 [LCK,gp=0x4000] "class" (has value)
     @52355c8 16 STRSXP g0c1 [NAM(2),ATT] (len=1, tl=0)
       @4740e48 09 CHARSXP g0c1 [gp=0x61] [ASCII] [cached] "A"
     ATTRIB:
       @52373f0 02 LISTSXP g0c0 []
	TAG: @128e500 01 SYMSXP g0c0 [NAM(2)] "package"
	@5235598 16 STRSXP g0c1 [NAM(2)] (len=1, tl=0)
	  @12ee2b8 09 CHARSXP g0c2 [gp=0x61] [ASCII] [cached] ".GlobalEnv"
> a at x[1]=2L
> .Internal(inspect(a))  ## almost everything duplicated!
@5243cd0 25 S4SXP g0c0 [OBJ,NAM(2),S4,gp=0x10,ATT]
ATTRIB:
   @5243c60 02 LISTSXP g0c0 []
     TAG: @12ea3a0 01 SYMSXP g0c0 [NAM(2)] "x"
     @5225b30 13 INTSXP g0c3 [NAM(1)] (len=5, tl=0) 2,2,3,4,5
     TAG: @1284b08 01 SYMSXP g0c0 [LCK,gp=0x4000] "class" (has value)
     @52405f8 16 STRSXP g0c1 [NAM(2),ATT] (len=1, tl=0)
       @4740e48 09 CHARSXP g0c1 [gp=0x61] [ASCII] [cached] "A"
     ATTRIB:
       @5243bf0 02 LISTSXP g0c0 []
	TAG: @128e500 01 SYMSXP g0c0 [NAM(2)] "package"
	@52405c8 16 STRSXP g0c1 [NAM(2)] (len=1, tl=0)
	  @12ee2b8 09 CHARSXP g0c2 [gp=0x61] [ASCII] [cached] ".GlobalEnv"

(this also influence performance of other R objects, of course, e.g.,

 > f = function(x) { x at a = 2L; x }
 > l = list(a=1:5); .Internal(inspect(l))
@53f8448 19 VECSXP g0c1 [NAM(1),ATT] (len=1, tl=0)
   @53cef48 13 INTSXP g0c3 [] (len=5, tl=0) 1,2,3,4,5
ATTRIB:
   @53f9190 02 LISTSXP g0c0 []
     TAG: @1284638 01 SYMSXP g0c0 [LCK,gp=0x4000] "names" (has value)
     @53f8418 16 STRSXP g0c1 [] (len=1, tl=0)
       @146b128 09 CHARSXP g0c1 [gp=0x61] [ASCII] [cached] "a"
 > .Internal(inspect(f(l)))
@53f83e8 19 VECSXP g0c1 [NAM(1),ATT] (len=1, tl=0)
   @53cef00 13 INTSXP g0c3 [] (len=5, tl=0) 2,2,3,4,5
ATTRIB:
   @53f9988 02 LISTSXP g0c0 []
     TAG: @1284638 01 SYMSXP g0c0 [LCK,gp=0x4000] "names" (has value)
     @53f83b8 16 STRSXP g0c1 [NAM(2)] (len=1, tl=0)
       @146b128 09 CHARSXP g0c1 [gp=0x61] [ASCII] [cached] "a"

Copies are localized to the updated field with reference classes (can't show 
this with .Internal(inspect()), though, because x = new.env(); x$x = x; 
.Internal(insepct(x)) [mimicking .self in reference classes] has an infinite (? 
I didn't wait that long) recursion).

I think actually reference classes have a surprising performance _hit_ compared 
to other R approaches to minimizing copying; this has come up on this or the R 
mailing list before, but I've lost track of the original. Here's a StackOverflow 
version

http://stackoverflow.com/questions/18677696/stack-class-in-r-something-more-concise/18678440#18678440

Martin


> Norm
>
> On Thu, Jan 09, 2014 at 09:44:10PM -0500, Simon Urbanek wrote:
>> On Jan 9, 2014, at 6:20 PM, Norm Matloff <matloff at cs.ucdavis.edu> wrote:
>>
>>> Bottom line:  Really no different from the case of ordinary vectors that are not in reference classes, right?  In other words, not true pass-by-reference.
>>>
>>
>> The pass-by-reference applies to the object itself, not necessarily to anything you obtain by calling a function on the object (like extracting a part from it). Vectors are not reference-semantics objects so regular rules apply.
>>
>> If you pass a reference semantics object to a function, the function can modify the object. If you pass any other object, the contents are guaranteed to not be touched. Reference-semantics objects in R are literally passed by reference (same C pointer), so yes, it is true pass-by-reference.
>>
>> Cheers,
>> Simon
>>
>>
>> (*) - technically, there is a thin non-refernce wrapper around the instances of reference classes, because there are things you don't want to happen to your ref-semantics instance - e.g. you don't want unclass(x) to destroy x and all instances of it (which it would do if there was no wrapper). But the actual payload of the object is a true ref-semantics object - an environment - that is always passed by reference.
>>
>>
>>
>>> Norm
>>>
>>> On Thu, Jan 09, 2014 at 04:43:44PM -0600, Hadley Wickham wrote:
>>>> It's a bit of a simplification, reference classes are wrappers around
>>>> environments.  So if modifying a value in an environment would create
>>>> a copy, then modifying the same value in a reference class will also
>>>> create a copy.
>>>>
>>>> The situation with modifying a vector is a bit complicated as it will
>>>> sometimes be modified in place and sometimes be duplicated and
>>>> modified (depending on whether its NAMED attribute is 1 or 2, and
>>>> exactly how you're modifying it).
>>>>
>>>> Hadley
>>>>
>>>> On Thu, Jan 9, 2014 at 4:33 PM, Norm Matloff <matloff at cs.ucdavis.edu> wrote:
>>>>> I have a question about reference classes, which someone here
>>>>> undoubtedly can answer immediately, saving me hours of wading through
>>>>> indecipherable internal code. :-)  Thanks in advance.
>>>>>
>>>>> Reference class data is mutable, fine, but in what sense?  Is it really
>>>>> physical,  or is it just a view given to the programmer?
>>>>>
>>>>> If for instance I have vector as a field in a reference class, and I
>>>>> change one element of the vector, is it really true that the change is
>>>>> guaranteed to be made in-place, no copying, no memory reallocation etc?
>>>>>
>>>>> Norm
>>>>>
>>>>> ______________________________________________
>>>>> R-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>>>
>>>>
>>>> --
>>>> http://had.co.nz/
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the R-devel mailing list