[Rd] surprising behaviour of names<-

Thu Mar 12 16:12:40 CET 2009

On Thu, 12 Mar 2009 15:21:50 +0100
Wacek Kusnierczyk <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:

[...]   
> >>> And the R Language manual (ignoring for the moment that it is a
> >>> draft and all that), 
> >>>       
> >> since we must...
> >>
> >>     
> >>> clearly states that 
> >>>
> >>> 	names(x) <- c("a","b")
> >>>
> >>> is equivalent to
> >>> 	
> >>> 	'*tmp*' <- x
> >>>          x <- "names<-"('*tmp*', value=c("a","b"))
> >>>   
> >>>       
> >> ... and?  
> >>     
> >
> > This seems to suggest 
> 
> seems to suggest?  is not the purpose of documentation to clearly,
> ideally beyond any doubt, specify what is to be specified?

The R Language Definition manual is still a draft. :)

> > that in this case the infix and prefix syntax
> > is not equivalent as it does not say that 
> >   
> 
> are you suggesting fortune telling from what the docs do *not* say?

My experience is that sometimes you have to realise what is not
stated.  I remember a discussion with somebody who asked why he could
not run, on windows, R CMD INSTALL on a *.zip file.  I pointed out to
him that the documentation states that you can run R CMD INSTALL on
*.tar.gz or *.tgz files and, thus, there should be no expectation that
it can be run on *.zip file.

YMMV, but when I read a passage like this in R documentation, I start
to wonder why it is stated that 
	names(x) <- c("a","b")
is equivalent to 
	*tmp* <- x
	x <- "names<-"('*tmp*', value=c("a","b"))
and the simpler construct
	x <- "names<-"(x, value=c("a", "b"))
is not used.  There must be a reason, nobody likes to type
unnecessarily long code.  And, after thinking about this for a while,
the penny might drop.

[...] 
> >> does this say anything about what 'names<-'(...) actually
> >> returns?  updated *tmp*, or a copy of it?
> >>     
> >
> > Since R uses pass-by-value, 
> 
> since?  it doesn't!

For all practical purposes it is as long as standard evaluation is
used.  One just have to be aware that some functions evaluate their
arguments in a non-standard way.  

[...]
> > If you entertain the idea that 'names<-' updates *tmp* and
> > returns the updated *tmp*, then you believe that 'names<-' behaves
> > in a non-standard way and should take appropriate care. 
> 
> i got lost in your argumentation.  [..]

I was commenting on "does this say anything about what 'names<-'(...)
actually returns?  updated *tmp*, or a copy of it?"

As I said, if you entertain the idea that 'names<-' returns an updated
*tmp*, then you believe that 'names<-' behaves in a non-standard way
and appropriate care has to be taken.

> > And the fact that a variable *tmp* is used hints to the fact that
> > 'names<-' might have side-effect.  
> 
> are you suggesting fortune telling from the fact that a variable *tmp*
> is used?

Nothing to do with fortune telling.  One reads the manual, one wonders
why is this construct used instead of an apparently much more simple
one, one reflects and investigates, one realises why the given
construct is stated as the equivalent: because "names<-" has
side-effects.

> > This is similar to the discussion what value i should have in the
> > following C snippet:
> > 	i = 0;
> >  	i += i++;
> >   
> 
> nonsense, it's a *completely* different issue.  here you touch the
> issue of the order of evaluation, and not of whether an object is
> copied or modified;  above, the inverse is true.

Sorry, there was a typo above.  The second statement should have been
	i = i++;

Then on some abstract level they are the same; an object appears on the
left hand side of an assignment but is also modified in the expression
assigned to it.  So what value should it end up with?

> >> why?  you can still use the infix names<- with destructive
> >> semantics to avoid copying. 
> >>     
> >
> > I guess that would require a rewrite (or extension) of the parser.
> > To me, Section 10.1.2 of the Language Definition manual suggests
> > that once an expression is parsed, you cannot distinguish any more
> > whether 'names<-' was called using infix syntax or prefix syntax.
> >   
> 
> but this must be nonsense, since:
> 
>     x = 1
>     'names<-'(x, 'foo')
>     names(x)
>     # NULL
> 
>     x = 1
>     names(x) <- 'foo'
>     names(x)
>     # "foo"
> 
> clearly, there is not only syntactic difference here.  but it might be
> that 10.1.2 does not suggest anything like what you say.

Please tell me how this example contradicts my reading of 10.1.2 that
the expressions 
	'names<-'(x, 'foo')
and
	names(x) <- 'foo'
once they are parsed, produce exactly the same parse tree and that it
becomes impossible to tell from the parse tree whether originally the
infix syntax or the prefix syntax was used.  In fact, the last sentence
in section 10.1.2 strongly suggests to me that the parse tree stores
all function calls as if prefix notation was used.  But it is probably
my English again.....

> > Thus, I guess you want to start a discussion with R Core whether it
> > is worthwhile to change the parser such that it keeps track on
> > whether a function was used with infix notation or prefix notation
> > and to provide for most (all?) assignment operators implementations
> > that use destructive semantics if the infix version was used and
> > always copy if the prefix notation is used. 
> >   
> 
> as i explained a few months ago, i study r to find examples of bad
> design.  if anyone in the r core is interested in having the problems
> i report fixed, 

Well, whether something is bad design and/or is a problem is in the eye
of the beholder.

Cheers,

	Berwin