[Rd] Correction to section 1.1.2 of R Internals doc, on NAMED

Wed Aug 25 02:23:10 CEST 2010

I think the explanation of the NAMED field in the R Internals document
is incorrect.  In Section 1.1.2, it says:

  The named field is set and accessed by the SET_NAMED and NAMED macros,
  and take values 0, 1 and 2. R has a `call by value' illusion, so an
  assignment like

       b <- a

  appears to make a copy of a and refer to it as b. However, if neither
  a nor b are subsequently altered there is no need to copy. What really
  happens is that a new symbol b is bound to the same value as a and the
  named field on the value object is set (in this case to 2). When an
  object is about to be altered, the named field is consulted. A value
  of 2 means that the object must be duplicated before being
  changed. (Note that this does not say that it is necessary to
  duplicate, only that it should be duplicated whether necessary or
  not.) A value of 0 means that it is known that no other SEXP shares
  data with this object, and so it may safely be altered. A value of 1
  is used for situations like

       dim(a) <- c(7, 2)

  where in principle two copies of a exist for the duration of the
  computation as (in principle)

       a <- `dim<-`(a, c(7, 2))

  but for no longer, and so some primitive functions can be optimized to
  avoid a copy in this case.

The implication of this somewhat confusing explanation is that values
of variables may have NAMED of 0, and that NAMED will be 1 only
briefly, during a few operations like dim(a) <- c(7,2).  But from my
reading of the R source, this is wrong.  It seems to me that NAMED
will quite often be 1 for extended periods of time.  For instance,
after a <- c(7,2), the value stored in a will have NAMED of 1.  If at
this point a[2] <- 0 is executed, no copy is made, because NAMED is 1.
If b <- a is then executed, the same value will be in both a and b,
and to reflect this, NAMED is incremented to 2.  If a[2] <- 0 is
executed at this point, a copy is made, since NAMED is 2.

Essentially, NAMED is a count of how many variables reference a value,
except it's not necessarily accurate.  First, once NAMED reaches 2, it
doesn't get incremented any higher.  Second, no attempt is made to
decrement NAMED when a variable ceases to refer to a value.  So the
end result is that a copy needs to be made when changing a variable
whose value has NAMED of 2, since it's possible that some other
variable references the same value.

There seems to be some confusion in the R source on this.  In the
do_for procedure, the value for the for loop variable is set up with
NAMED being 0, though according to my explanation above, it ought to
be set up with NAMED of 1.  A bug is avoided here only because the
procedures for getting values from variables check if NAMED is 0, and
if so fix it up to being 1, which is the minimum that it ought to be
for a value that's stored in a variable.

Is my understanding of this correct?  Or have I missed something?

   Radford Neal