[Rd] Confused about NAMED

Simon Urbanek simon.urbanek at r-project.org
Thu Nov 24 20:31:20 CET 2011


On Nov 24, 2011, at 1:48 PM, Prof Brian Ripley wrote:

> On Thu, 24 Nov 2011, Simon Urbanek wrote:
> 
>> 
>> On Nov 24, 2011, at 8:05 AM, Matthew Dowle wrote:
>> 
>>>> 
>>>> On Nov 24, 2011, at 12:34 , Matthew Dowle wrote:
>>>> 
>>>>>> 
>>>>>> On Nov 24, 2011, at 11:13 , Matthew Dowle wrote:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I expected NAMED to be 1 in all these three cases. It is for one of
>>>>>>> them,
>>>>>>> but not the other two?
>>>>>>> 
>>>>>>>> R --vanilla
>>>>>>> R version 2.14.0 (2011-10-31)
>>>>>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>>>>> 
>>>>>>>> x = 1L
>>>>>>>> .Internal(inspect(x))   # why NAM(2)? expected NAM(1)
>>>>>>> @2514aa0 13 INTSXP g0c1 [NAM(2)] (len=1, tl=0) 1
>>>>>>> 
>>>>>>>> y = 1:10
>>>>>>>> .Internal(inspect(y))   # NAM(1) as expected but why different to x?
>>>>>>> @272f788 13 INTSXP g0c4 [NAM(1)] (len=10, tl=0) 1,2,3,4,5,...
>>>>>>> 
>>>>>>>> z = data.frame()
>>>>>>>> .Internal(inspect(z))   # why NAM(2)? expected NAM(1)
>>>>>>> @24fc28c 19 VECSXP g0c0 [OBJ,NAM(2),ATT] (len=0, tl=0)
>>>>>>> ATTRIB:
>>>>>>> @24fc270 02 LISTSXP g0c0 []
>>>>>>> TAG: @3f2120 01 SYMSXP g0c0 [MARK,gp=0x4000] "names"
>>>>>>> @24fc334 16 STRSXP g0c0 [] (len=0, tl=0)
>>>>>>> TAG: @3f2040 01 SYMSXP g0c0 [MARK,gp=0x4000] "row.names"
>>>>>>> @24fc318 13 INTSXP g0c0 [] (len=0, tl=0)
>>>>>>> TAG: @3f2388 01 SYMSXP g0c0 [MARK,gp=0x4000] "class"
>>>>>>> @25be500 16 STRSXP g0c1 [] (len=1, tl=0)
>>>>>>>   @1d38af0 09 CHARSXP g0c2 [MARK,gp=0x21,ATT] "data.frame"
>>>>>>> 
>>>>>>> It's a little difficult to search for the word "named" but I tried and
>>>>>>> found this in R-ints :
>>>>>>> 
>>>>>>> "Note that optimizing NAMED = 1 is only effective within a primitive
>>>>>>> (as the closure wrapper of a .Internal will set NAMED = 2 when the
>>>>>>> promise to the argument is evaluated)"
>>>>>>> 
>>>>>>> So might it be that just looking at NAMED using .Internal(inspect())
>>>>>>> is
>>>>>>> setting NAMED=2?  But if so, why does y have NAMED==1?
>>>>>> 
>>>>>> This is tricky business... I'm not quite sure I'll get it right, but
>>>>>> let's
>>>>>> try
>>>>>> 
>>>>>> When you are assigning a constant, the value you assign is already part
>>>>>> of
>>>>>> the assignment expression, so if you want to modify it, you must
>>>>>> duplicate. So NAMED==2 on z <- 1 is basically to prevent you from
>>>>>> accidentally "changing the value of 1". If it weren't, then you could
>>>>>> get
>>>>>> bitten by code like for(i in 1:2) {z <- 1; if(i==1) z[1] <- 2}.
>>>>>> 
>>>>>> If you're assigning the result of a computation, then the object only
>>>>>> exists once, so
>>>>>> z <- 0+1  gets NAMED==1.
>>>>>> 
>>>>>> However, if the computation is done by returning a named value from
>>>>>> within
>>>>>> a function, as in
>>>>>> 
>>>>>>> f <- function(){v <- 1+0; v}
>>>>>>> z <- f()
>>>>>> 
>>>>>> then again NAMED==2. This is because the side effects of the function
>>>>>> _might_ result in something having a hold on the function environment,
>>>>>> e.g. if we had
>>>>>> 
>>>>>> e <- NULL
>>>>>> f <- function(){e <<-environment(); v <- 1+0; v}
>>>>>> z <- f()
>>>>>> 
>>>>>> then z[1] <- 5 would change e$v too. As it happens, there aren't any
>>>>>> side
>>>>>> effects in the forme case, but R loses track and assumes the worst.
>>>>>> 
>>>>> 
>>>>> Thanks a lot, think I follow. That explains x vs y, but why is z
>>>>> NAMED==2?
>>>>> The result of data.frame() is an object that exists once (similar to
>>>>> 1:10)
>>>>> so shouldn't it be NAMED==1 too?  Or, R loses track and assumes the
>>>>> worst
>>>>> even on its own functions such as data.frame()?
>>>> 
>>>> R loses track. I suspect that is really all it can do without actual
>>>> reference counting. The function data.frame is more than 150 lines of
>>>> code, and if any of those end up invoking user code, possibly via a class
>>>> method, you can't tell definitively whether or not the evaluation
>>>> environment dies at the return.
>>> 
>>> Ohhh, think I see now. After Duncan's reply I was going to ask if it was
>>> possible to change data.frame() to be primitive so it could set NAMED=1.
>>> But it seems primitive functions can't use R code so data.frame() would
>>> need to be ported to C. Ok! - not quick or easy, and not without
>>> consideable risk. And, data.frame() can invoke user code inside it anyway
>>> then.
> 
> Maybe some review of the 'R Internals' manual about what a primitive function is would be desirable.  Converting such a function to C would ossify it, which is the major reason it has not been done (it has been contemplated).
> 
>>> Since list() is primitive I tried to construct a data.frame starting with
>>> list() [since structure() isn't primitive], but then merely adding an
>>> attribute seems to set NAMED==2 too ?
>>> 
>> 
>> Yes, because attr(x,y) <- z is the same as
>> 
>> `*tmp*` <- x
>> x <- `attr<-`(`*tmp*`, y, z)
>> rm(`*tmp*`)
> 
> Only if it were an interpreted function.
> 
>> so there are two references to the data frame: one in DF and one in `*tmp*`. It is the first line that causes the NAMED bump. And, yes, it's real:
>> 
>>> `f<-`=function(x,value) { print(ls(parent.frame())); x<-value }
>>> x=1
>>> f(x)=1
>> [1] "*tmp*" "f<-"   "x"
> 
> You have just explained why interpreted replacement functions set NAMED=2, but this does not apply to primitives.
> 

It does - see eval.c l1680-2 which causes it to go through do_set which is turn bumps NAMED. I have responded only to Luke but I guess I should have included everyone..


> To help convince you, consider
> 
>> d <- 1:2
>> attributes(d) <- list(x=13)
>> d
> [1] 1 2
> attr(,"x")
> [1] 13
>> .Internal(inspect(d))
> @11be748 13 INTSXP g0c1 [NAM(1),ATT] (len=2, tl=0) 1,2
> ATTRIB:
>  @1552054 02 LISTSXP g0c0 []
>    TAG: @102b1c0 01 SYMSXP g0c0 [MARK,NAM(2)] "x"
>    @11be768 14 REALSXP g0c1 [] (len=1, tl=0) 13
> 
> Now, as to why attr<- (which is primitive) does what it does you will need to read (and understand) the code.
> 

Because do_attributesgets duplicates (attrib.c l1178) which you can easily see:

> d <- 1:2
> .Internal(inspect(d))
@155aba8 13 INTSXP g0c1 [NAM(1)] (len=2, tl=0) 1,2
> attributes(d) <- list(x=13)
> .Internal(inspect(d))
@15dbe28 13 INTSXP g0c1 [NAM(1),ATT] (len=2, tl=0) 1,2
ATTRIB:
  @16da5a8 02 LISTSXP g0c0 [] 
    TAG: @660008 01 SYMSXP g0c0 [MARK,NAM(2)] "x"
    @15dbe58 14 REALSXP g0c1 [] (len=1, tl=0) 13

Note the different pointer of the value of d now -- do_attributesgets returns a duplicate with NAMED=0 so do_set assignment bumps it to 1.

Cheers,
Simon



>> 
>> You could skip that by using the function directly (I don't think it's recommended, though):
>> 
>>> .Internal(inspect(l <- list(a=1)))
>> @1028c82f8 19 VECSXP g0c1 [NAM(1),ATT] (len=1, tl=0)
>> @1028c8268 14 REALSXP g0c1 [] (len=1, tl=0) 1
>> ATTRIB:
>> @100b6e748 02 LISTSXP g0c0 []
>>   TAG: @100843878 01 SYMSXP g0c0 [MARK,gp=0x4000] "names"
>>   @1028c82c8 16 STRSXP g0c1 [] (len=1, tl=0)
>>     @1009cd388 09 CHARSXP g0c1 [MARK,gp=0x21] "a"
>>> .Internal(inspect(`names<-`(l, "b")))
>> @1028c82f8 19 VECSXP g0c1 [NAM(1),ATT] (len=1, tl=0)
>> @1028c8268 14 REALSXP g0c1 [] (len=1, tl=0) 1
>> ATTRIB:
>> @100b6e748 02 LISTSXP g0c0 []
>>   TAG: @100843878 01 SYMSXP g0c0 [MARK,gp=0x4000] "names"
>>   @1028c8178 16 STRSXP g0c1 [NAM(1)] (len=1, tl=0)
>>     @100967af8 09 CHARSXP g0c1 [MARK,gp=0x20] "b"
>>> .Internal(inspect(l))
>> @1028c82f8 19 VECSXP g0c1 [NAM(1),ATT] (len=1, tl=0)
>> @1028c8268 14 REALSXP g0c1 [] (len=1, tl=0) 1
>> ATTRIB:
>> @100b6e748 02 LISTSXP g0c0 []
>>   TAG: @100843878 01 SYMSXP g0c0 [MARK,gp=0x4000] "names"
>>   @1028c8178 16 STRSXP g0c1 [NAM(1)] (len=1, tl=0)
>>     @100967af8 09 CHARSXP g0c1 [MARK,gp=0x20] "b"
>> 
>> Cheers,
>> Simon
>> 
>> 
>> 
>>>> DF = list(a=1:3,b=4:6)
>>>> .Internal(inspect(DF))     # so far so good: NAM(1)
>>> @25149e0 19 VECSXP g0c1 [NAM(1),ATT] (len=2, tl=0)
>>> @263ea50 13 INTSXP g0c2 [] (len=3, tl=0) 1,2,3
>>> @263eaa0 13 INTSXP g0c2 [] (len=3, tl=0) 4,5,6
>>> ATTRIB:
>>> @2457984 02 LISTSXP g0c0 []
>>>   TAG: @3f2120 01 SYMSXP g0c0 [MARK,gp=0x4000] "names"
>>>   @25149c0 16 STRSXP g0c1 [] (len=2, tl=0)
>>>     @1e987d8 09 CHARSXP g0c1 [MARK,gp=0x21] "a"
>>>     @1e56948 09 CHARSXP g0c1 [MARK,gp=0x21] "b"
>>>> 
>>>> attr(DF,"foo") <- "bar"    # just adding an attribute sets NAM(2) ?
>>>> .Internal(inspect(DF))
>>> @25149e0 19 VECSXP g0c1 [NAM(2),ATT] (len=2, tl=0)
>>> @263ea50 13 INTSXP g0c2 [] (len=3, tl=0) 1,2,3
>>> @263eaa0 13 INTSXP g0c2 [] (len=3, tl=0) 4,5,6
>>> ATTRIB:
>>> @2457984 02 LISTSXP g0c0 []
>>>   TAG: @3f2120 01 SYMSXP g0c0 [MARK,gp=0x4000] "names"
>>>   @25149c0 16 STRSXP g0c1 [] (len=2, tl=0)
>>>     @1e987d8 09 CHARSXP g0c1 [MARK,gp=0x21] "a"
>>>     @1e56948 09 CHARSXP g0c1 [MARK,gp=0x21] "b"
>>>   TAG: @245732c 01 SYMSXP g0c0 [] "foo"
>>>   @25148a0 16 STRSXP g0c1 [NAM(1)] (len=1, tl=0)
>>>     @2514920 09 CHARSXP g0c1 [gp=0x20] "bar"
>>> 
>>> 
>>> Matthew
>>> 
>>> 
>>>> --
>>>> Peter Dalgaard, Professor
>>>> Center for Statistics, Copenhagen Business School
>>>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>>>> Phone: (+45)38153501
>>>> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
>>>> 
>>>> 
>>> 
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 
>>> 
>> 
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
> 
> -- 
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
> 
> 



More information about the R-devel mailing list