[Rd] surprising behaviour of names<-

Thu Mar 12 15:21:50 CET 2009

Berwin A Turlach wrote:
> On Thu, 12 Mar 2009 10:53:19 +0100
> Wacek Kusnierczyk <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:
>
>   
>> well, ?'names<-' says:
>>
>> "
>> Value:
>>      For 'names<-', the updated object. 
>> "
>>
>> which is only partially correct, in that the value will sometimes be
>> an updated *copy* of the object.
>>     
>
> But since R supposedly 

*supposedly*

> uses call-by-value (though we know how to
> circumvent that, don't we?) 

we know how a lot of built-ins hack around this, don't we, and we also
know that call-by-value is not really the argument passing mechanism in r.

> wouldn't you always expect that a copy of
> the object is returned?
>   

indeed!  that's what i have said previously, no?  there is still space
for the smart (i mean it) copy-on-assignment behaviour, but it should
not be visible to the user, in particular, not in that 'names<-'
destructively modifies the object it is given when the refcount is 1. 
in my humble opinion, there is either a design flaw or a bug here.

>  
>   
>>> And the R Language manual (ignoring for the moment that it is a
>>> draft and all that), 
>>>       
>> since we must...
>>
>>     
>>> clearly states that 
>>>
>>> 	names(x) <- c("a","b")
>>>
>>> is equivalent to
>>> 	
>>> 	'*tmp*' <- x
>>>          x <- "names<-"('*tmp*', value=c("a","b"))
>>>   
>>>       
>> ... and?  
>>     
>
> This seems to suggest 

seems to suggest?  is not the purpose of documentation to clearly,
ideally beyond any doubt, specify what is to be specified?

> that in this case the infix and prefix syntax
> is not equivalent as it does not say that 
>   

are you suggesting fortune telling from what the docs do *not* say?

> 	names(x) <- c("a","b")
> is equivalent to
> 	x <- "names<-"(x, value=c("a","b"))
> and I was commenting on the claim that the infix syntax is equivalent
> to the prefix syntax.
>
>   
>> does this say anything about what 'names<-'(...) actually
>> returns?  updated *tmp*, or a copy of it?
>>     
>
> Since R uses pass-by-value, 

since?  it doesn't!

> you would expect the latter, wouldn't
> you?  

yes, that's what i'd expect in a functional language.

> If you entertain the idea that 'names<-' updates *tmp* and
> returns the updated *tmp*, then you believe that 'names<-' behaves in a
> non-standard way and should take appropriate care.
>   

i got lost in your argumentation.  i have given examples of where
'names<-' destructively modifies and returns the updated object, not a
copy.  what is your point here?

> And the fact that a variable *tmp* is used hints to the fact that
> 'names<-' might have side-effect.  

are you suggesting fortune telling from the fact that a variable *tmp*
is used?

> If 'names<-' has side effects,
> then it might not be well defined with what value x ends up with if
> one executes:
> 	x <- 'names<-'(x, value=c("a","b"))  
>   

not really, unless you mean the returned object in the referential sense
(memory location) versus value conceptually.  here x will obviously have
the value of the original x plus the names, *but* indeed you cannot tell
from this snippet whether after the assignment x will be the same,
though updated, object or will rather be an updated copy:

    x = c(1)
    x = 'names<-'(x, 'foo')
    # x is the same object

    x = c(1)
    y = x
    x = 'names<-'(x, 'foo')
    # x is another object

so, as you say, it is not well defined with what object will x end up as
its value, though the value of the object visible to the user is well
defined.  rewrite the above and play:

    x = c(1)
    y = 'names<-'(x, 'foo')
    names(x)

what are the names of x?  is y identical (sensu refernce) with x, is y
different (sensu reference) but indiscernible (sensu value) from x, or
is y different (sensu value) from x in that y has names and x doesn't?

> This is similar to the discussion what value i should have in the
> following C snippet:
> 	i = 0;
>  	i += i++;
>   

nonsense, it's a *completely* different issue.  here you touch the issue
of the order of evaluation, and not of whether an object is copied or
modified;  above, the inverse is true.

in fact, your example is useless because the result here is clearly
specified by the semantics (as far as i know -- prove me wrong).  you
lookup i (0) and i (0) (the order does not matter here), add these
values (0), assign to i (0), and increase i (1). 

i have a better example for you:

    int i = 0;
    i += ++i - ++i

which will give different final values for i in c (2 with gcc 4.2, 1
with gcc 3.4), c# and java (-1), perl (2) and php (1).  again, this has
nothing to do with the above.

>  
> [..]
>   
>>> I am not sure whether R ever behaved in that way, but as Peter
>>> pointed out, this would be quite undesirable from a memory
>>> management and performance point of view.  
>>>       
>> why?  you can still use the infix names<- with destructive semantics
>> to avoid copying. 
>>     
>
> I guess that would require a rewrite (or extension) of the parser.  To
> me, Section 10.1.2 of the Language Definition manual suggests that once
> an expression is parsed, you cannot distinguish any more whether
> 'names<-' was called using infix syntax or prefix syntax.
>   

but this must be nonsense, since:

    x = 1
    'names<-'(x, 'foo')
    names(x)
    # NULL

    x = 1
    names(x) <- 'foo'
    names(x)
    # "foo"

clearly, there is not only syntactic difference here.  but it might be
that 10.1.2 does not suggest anything like what you say.

> Thus, I guess you want to start a discussion with R Core whether it is
> worthwhile to change the parser such that it keeps track on whether a
> function was used with infix notation or prefix notation and to
> provide for most (all?) assignment operators implementations that use
> destructive semantics if the infix version was used and always copy if
> the prefix notation is used. 
>   

as i explained a few months ago, i study r to find examples of bad
design.  if anyone in the r core is interested in having the problems i
report fixed, i'm happy to get involved in a discussion about the design
and implementation.  if not, i'm happy with just pointing out the issues.

cheers,
vQ