[Rd] surprising behaviour of names<-

Sat Mar 14 07:22:34 CET 2009

Berwin A Turlach wrote:
> On Fri, 13 Mar 2009 19:41:42 +0100
> Wacek Kusnierczyk <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:
>
>
>   
>> indeed, you said "R supposedly uses call-by-value (though we know how
>> to circumvent that, don't we?)".
>>
>> in that vain, R supposedly can be used to do valid statistical
>> computations (though we know how to circumvent it) ;)
>>     
>
> Sure, use Excel? ;-)
>   

no, it has a buggy round

>  
>   
>>> Indeed, if you type these two commands on the command line, then it
>>> is not surprising that a copy of tmp is returned since you create a
>>> temporary object that ends up in the symbol table and persist after
>>> the commands are finished.
>>>   
>>>       
>> what does command line have to do with it?
>>     
>
> If you want to find out what goes on under the hood, it is not
> necessarily sufficient to do the same calculations on the command line.
>  
>   
>>> Obviously, assuming that R really executes 
>>> 	*tmp* <- x
>>> 	x <- "names<-"('*tmp*', value=c("a","b"))
>>> under the hood, in the C code, then *tmp* does not end up in the
>>> symbol table 
>>>       
>> no?
>>     
>
> Well, I don't see any new object created in my workspace after
> 	x <- 4
> 	names(x) <- "foo"
> Do you?
>   

of course not.  that's why i'd say the two above are *not* equivalent. 

i haven't noticed the 'in the c code';  do you mean the r interpreter
actually generates, in the c code, such r expressions for itself to
evaluate?

>   
>> i guess you have looked under the hood;  point me to the relevant
>> code.
>>     
>
> No I did not, because I am not interested in knowing such intimate
> details of R, but it seems you were interested.
>   

yes, but then your claim about what happens under the hood, in the c
code, is a pure stipulation.  and you got the example from the r
language definition sec. 10.2, which says the forms are equivalent, with
no 'under the hood, in the c code' comment.

you're just showing that your statements cannot be taken seriously.

>  
>   
>> yes, *if* you are able to predict the refcount of the object passed to
>> 'names<-' *then* you can predict what 'names<-' will do, [...] 
>>     
>
> I think Simon pointed already out that you seem to have a wrong
> picture of what is going on.  As far as I know, there is no refcount
> for objects.  
>
> The relevant documentation would be R Language Manual, 1.1 SEXPs:
>
>   What R users think of as variables or objects are symbols which are
>   bound to a value. The value can be thought of as either a SEXP (a
>   pointer), or the structure it points to, a SEXPREC (and there are
>   alternative forms used for vectors, namely VECSXP pointing to
>   VECTOR_SEXPREC structures).
>
> and 1.1.2 Rest of header:
>
>   The named field is set and accessed by the SET_NAMED
>   and NAMED macros, and take values 0, 1 and 2. R has a `call by value'
>   illusion, so an assignment like
>
>       b <- a
>
>   appears to make a copy of a and refer to it as b. However, if neither
>   a nor b are subsequently altered there is no need to copy. What really
>   happens is that a new symbol b is bound to the same value as a and the
>   named field on the value object is set (in this case to 2). When an
>   object is about to be altered, the named field is consulted. A value
>   of 2 means that the object must be duplicated before being changed.
>   (Note that this does not say that it is necessary to duplicate, only
>   that it should be duplicated whether necessary or not.) A value of 0
>   means that it is known that no other SEXP shares data with this
>   object, and so it may safely be altered. A value of 1 is used for
>   situations like
>
>       dim(a) <- c(7, 2)
>
>   where in principle two copies of a exist for the duration of the
>   computation as (in principle)
>
>       a <- `dim<-`(a, c(7, 2))
>
>   but for no longer, and so some primitive functions can be optimized to
>   avoid a copy in this case. 
>
>   

so what you quote effectively talks about a specific refcount
mechanism.  it's not refcount that would be used by the garbage
collector, but it's a refcount, or maybe refflag.

>> and in general, this should not matter because it should be
>> unobservable, but it isn't.
>>     
>
> That's your opinion (to which you are entitled).  

yes, that's my opinion:  the effects of implementation tricks should not
be observable by the user, because they can lead to hard to explain and
debug behaviour in the user's program.  you surely don't suggest that
all users consult the source code before writing programs in r.

> Unfortunately (for
> you), the designers of R decided on a design which allows them to
> reduce the number of copies that have to be made.
>   

and that's excellent, only that they failed to hide the mechanism below
the interface.  or maybe they decided not to hide it?

> I was under the impression that you were interested to understand what
> happens if you issue the commands
> 	names(x) <- "foo"
> and
> 	"names<-"(x, "foo")
> and I must agree with Simon, the answer by Peter was explaining it very
> well to someone familiar with the documentation of R.  The fact that
> you found that answer unsatisfactory suggests that you could improve
> your familiarity with the documentation.  

i have indeed learned what prefix 'names<-' does and now i know that the
surprising behaviour is due to the observability of the internal
optimization.

thanks to simon, peter, and you for your answers which allowed me to
learn this ugly detail.

vQ