[Rd] reference class internals

Norm Matloff matloff at cs.ucdavis.edu
Fri Jan 10 06:08:51 CET 2014


I guess I should explain where I'm coming from in all this.

I've always been something of a skeptic on object-oriented programming.
Though I agree it has some advantages, and I do use it myself (in
Python), in general I think it makes one work far too hard for the
potential benefit.  C++ templates (which I use in Thrust) drive me
crazy, very frustrating.

So I am, for better or worse, one of those people who don't even like S4
(again a style issue).  Obviously those who do like S4 may get a
performance benefit via reference classes in the situation Martin
mentions below.

I've been meaning for some time to look into whether there might
actually be a performance benefit for non-OOP programmers like me,
thinking the answer would be no but wanting to confirm.  So,
today I finally got around to asking, and immediately got three quick,
cogent and informative replies.  This testifies to the quality of the
membership of this list!  Thanks very much.

Norm

On Thu, Jan 09, 2014 at 08:27:09PM -0800, Martin Morgan wrote:
> On 01/09/2014 07:53 PM, Norm Matloff wrote:
> >
> >Thanks, Hadley and Simon.
> >
> >The reason I asked today was that when reference classes first came out,
> >it had appeared to me that there is no peformance advantage to using
> >reference classes, that it was mainly a style issue (encapsulation,
> >etc.).  Unless I'm missing something, both of you have confirmed my
> >original impression, correct?
> 
> We've used reference classes for performance benefit. E.g., updating
> a single (e.g., small) field in an S4 object triggers an entire copy
> of the object, whereas for a reference class the fields can be
> updated independently. This is especially true inside function
> (e.g., method) calls (e.g., slot access), where the object is marked
> to be duplicated.
> 
> 
> >> a = setClass("A", representation(x="numeric"))(x=1:5)
> >.Internal(inspect(a))
> @5237508 25 S4SXP g0c0 [OBJ,NAM(2),S4,gp=0x10,ATT]
> ATTRIB:
>   @5237460 02 LISTSXP g0c0 []
>     TAG: @12ea3a0 01 SYMSXP g0c0 [NAM(2)] "x"
>     @5225db8 13 INTSXP g0c3 [NAM(2)] (len=5, tl=0) 1,2,3,4,5
>     TAG: @1284b08 01 SYMSXP g0c0 [LCK,gp=0x4000] "class" (has value)
>     @52355c8 16 STRSXP g0c1 [NAM(2),ATT] (len=1, tl=0)
>       @4740e48 09 CHARSXP g0c1 [gp=0x61] [ASCII] [cached] "A"
>     ATTRIB:
>       @52373f0 02 LISTSXP g0c0 []
> 	TAG: @128e500 01 SYMSXP g0c0 [NAM(2)] "package"
> 	@5235598 16 STRSXP g0c1 [NAM(2)] (len=1, tl=0)
> 	  @12ee2b8 09 CHARSXP g0c2 [gp=0x61] [ASCII] [cached] ".GlobalEnv"
> >a at x[1]=2L
> >.Internal(inspect(a))  ## almost everything duplicated!
> @5243cd0 25 S4SXP g0c0 [OBJ,NAM(2),S4,gp=0x10,ATT]
> ATTRIB:
>   @5243c60 02 LISTSXP g0c0 []
>     TAG: @12ea3a0 01 SYMSXP g0c0 [NAM(2)] "x"
>     @5225b30 13 INTSXP g0c3 [NAM(1)] (len=5, tl=0) 2,2,3,4,5
>     TAG: @1284b08 01 SYMSXP g0c0 [LCK,gp=0x4000] "class" (has value)
>     @52405f8 16 STRSXP g0c1 [NAM(2),ATT] (len=1, tl=0)
>       @4740e48 09 CHARSXP g0c1 [gp=0x61] [ASCII] [cached] "A"
>     ATTRIB:
>       @5243bf0 02 LISTSXP g0c0 []
> 	TAG: @128e500 01 SYMSXP g0c0 [NAM(2)] "package"
> 	@52405c8 16 STRSXP g0c1 [NAM(2)] (len=1, tl=0)
> 	  @12ee2b8 09 CHARSXP g0c2 [gp=0x61] [ASCII] [cached] ".GlobalEnv"
> 
> (this also influence performance of other R objects, of course, e.g.,
> 
> > f = function(x) { x at a = 2L; x }
> > l = list(a=1:5); .Internal(inspect(l))
> @53f8448 19 VECSXP g0c1 [NAM(1),ATT] (len=1, tl=0)
>   @53cef48 13 INTSXP g0c3 [] (len=5, tl=0) 1,2,3,4,5
> ATTRIB:
>   @53f9190 02 LISTSXP g0c0 []
>     TAG: @1284638 01 SYMSXP g0c0 [LCK,gp=0x4000] "names" (has value)
>     @53f8418 16 STRSXP g0c1 [] (len=1, tl=0)
>       @146b128 09 CHARSXP g0c1 [gp=0x61] [ASCII] [cached] "a"
> > .Internal(inspect(f(l)))
> @53f83e8 19 VECSXP g0c1 [NAM(1),ATT] (len=1, tl=0)
>   @53cef00 13 INTSXP g0c3 [] (len=5, tl=0) 2,2,3,4,5
> ATTRIB:
>   @53f9988 02 LISTSXP g0c0 []
>     TAG: @1284638 01 SYMSXP g0c0 [LCK,gp=0x4000] "names" (has value)
>     @53f83b8 16 STRSXP g0c1 [NAM(2)] (len=1, tl=0)
>       @146b128 09 CHARSXP g0c1 [gp=0x61] [ASCII] [cached] "a"
> 
> Copies are localized to the updated field with reference classes
> (can't show this with .Internal(inspect()), though, because x =
> new.env(); x$x = x; .Internal(insepct(x)) [mimicking .self in
> reference classes] has an infinite (? I didn't wait that long)
> recursion).
> 
> I think actually reference classes have a surprising performance
> _hit_ compared to other R approaches to minimizing copying; this has
> come up on this or the R mailing list before, but I've lost track of
> the original. Here's a StackOverflow version
> 
> http://stackoverflow.com/questions/18677696/stack-class-in-r-something-more-concise/18678440#18678440
> 
> Martin
> 
> 
> >Norm
> >
> >On Thu, Jan 09, 2014 at 09:44:10PM -0500, Simon Urbanek wrote:
> >>On Jan 9, 2014, at 6:20 PM, Norm Matloff <matloff at cs.ucdavis.edu> wrote:
> >>
> >>>Bottom line:  Really no different from the case of ordinary vectors that are not in reference classes, right?  In other words, not true pass-by-reference.
> >>>
> >>
> >>The pass-by-reference applies to the object itself, not necessarily to anything you obtain by calling a function on the object (like extracting a part from it). Vectors are not reference-semantics objects so regular rules apply.
> >>
> >>If you pass a reference semantics object to a function, the function can modify the object. If you pass any other object, the contents are guaranteed to not be touched. Reference-semantics objects in R are literally passed by reference (same C pointer), so yes, it is true pass-by-reference.
> >>
> >>Cheers,
> >>Simon
> >>
> >>
> >>(*) - technically, there is a thin non-refernce wrapper around the instances of reference classes, because there are things you don't want to happen to your ref-semantics instance - e.g. you don't want unclass(x) to destroy x and all instances of it (which it would do if there was no wrapper). But the actual payload of the object is a true ref-semantics object - an environment - that is always passed by reference.
> >>
> >>
> >>
> >>>Norm
> >>>
> >>>On Thu, Jan 09, 2014 at 04:43:44PM -0600, Hadley Wickham wrote:
> >>>>It's a bit of a simplification, reference classes are wrappers around
> >>>>environments.  So if modifying a value in an environment would create
> >>>>a copy, then modifying the same value in a reference class will also
> >>>>create a copy.
> >>>>
> >>>>The situation with modifying a vector is a bit complicated as it will
> >>>>sometimes be modified in place and sometimes be duplicated and
> >>>>modified (depending on whether its NAMED attribute is 1 or 2, and
> >>>>exactly how you're modifying it).
> >>>>
> >>>>Hadley
> >>>>
> >>>>On Thu, Jan 9, 2014 at 4:33 PM, Norm Matloff <matloff at cs.ucdavis.edu> wrote:
> >>>>>I have a question about reference classes, which someone here
> >>>>>undoubtedly can answer immediately, saving me hours of wading through
> >>>>>indecipherable internal code. :-)  Thanks in advance.
> >>>>>
> >>>>>Reference class data is mutable, fine, but in what sense?  Is it really
> >>>>>physical,  or is it just a view given to the programmer?
> >>>>>
> >>>>>If for instance I have vector as a field in a reference class, and I
> >>>>>change one element of the vector, is it really true that the change is
> >>>>>guaranteed to be made in-place, no copying, no memory reallocation etc?
> >>>>>
> >>>>>Norm
> >>>>>
> >>>>>______________________________________________
> >>>>>R-devel at r-project.org mailing list
> >>>>>https://stat.ethz.ch/mailman/listinfo/r-devel
> >>>>
> >>>>
> >>>>
> >>>>--
> >>>>http://had.co.nz/
> >>>
> >>>______________________________________________
> >>>R-devel at r-project.org mailing list
> >>>https://stat.ethz.ch/mailman/listinfo/r-devel
> >>>
> >>
> >
> >______________________________________________
> >R-devel at r-project.org mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> 
> 
> -- 
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
> 
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793



More information about the R-devel mailing list