[Rd] Confused about NAMED

Matthew Dowle mdowle at mdowle.plus.com
Thu Nov 24 19:48:09 CET 2011


>
> On Nov 24, 2011, at 8:05 AM, Matthew Dowle wrote:
>
>>>
>>> On Nov 24, 2011, at 12:34 , Matthew Dowle wrote:
>>>
>>>>>
>>>>> On Nov 24, 2011, at 11:13 , Matthew Dowle wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I expected NAMED to be 1 in all these three cases. It is for one of
>>>>>> them,
>>>>>> but not the other two?
>>>>>>
>>>>>>> R --vanilla
>>>>>> R version 2.14.0 (2011-10-31)
>>>>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>>>>
>>>>>>> x = 1L
>>>>>>> .Internal(inspect(x))   # why NAM(2)? expected NAM(1)
>>>>>> @2514aa0 13 INTSXP g0c1 [NAM(2)] (len=1, tl=0) 1
>>>>>>
>>>>>>> y = 1:10
>>>>>>> .Internal(inspect(y))   # NAM(1) as expected but why different to
>>>>>>> x?
>>>>>> @272f788 13 INTSXP g0c4 [NAM(1)] (len=10, tl=0) 1,2,3,4,5,...
>>>>>>
>>>>>>> z = data.frame()
>>>>>>> .Internal(inspect(z))   # why NAM(2)? expected NAM(1)
>>>>>> @24fc28c 19 VECSXP g0c0 [OBJ,NAM(2),ATT] (len=0, tl=0)
>>>>>> ATTRIB:
>>>>>> @24fc270 02 LISTSXP g0c0 []
>>>>>>  TAG: @3f2120 01 SYMSXP g0c0 [MARK,gp=0x4000] "names"
>>>>>>  @24fc334 16 STRSXP g0c0 [] (len=0, tl=0)
>>>>>>  TAG: @3f2040 01 SYMSXP g0c0 [MARK,gp=0x4000] "row.names"
>>>>>>  @24fc318 13 INTSXP g0c0 [] (len=0, tl=0)
>>>>>>  TAG: @3f2388 01 SYMSXP g0c0 [MARK,gp=0x4000] "class"
>>>>>>  @25be500 16 STRSXP g0c1 [] (len=1, tl=0)
>>>>>>    @1d38af0 09 CHARSXP g0c2 [MARK,gp=0x21,ATT] "data.frame"
>>>>>>
>>>>>> It's a little difficult to search for the word "named" but I tried
>>>>>> and
>>>>>> found this in R-ints :
>>>>>>
>>>>>>  "Note that optimizing NAMED = 1 is only effective within a
>>>>>> primitive
>>>>>> (as the closure wrapper of a .Internal will set NAMED = 2 when the
>>>>>> promise to the argument is evaluated)"
>>>>>>
>>>>>> So might it be that just looking at NAMED using .Internal(inspect())
>>>>>> is
>>>>>> setting NAMED=2?  But if so, why does y have NAMED==1?
>>>>>
>>>>> This is tricky business... I'm not quite sure I'll get it right, but
>>>>> let's
>>>>> try
>>>>>
>>>>> When you are assigning a constant, the value you assign is already
>>>>> part
>>>>> of
>>>>> the assignment expression, so if you want to modify it, you must
>>>>> duplicate. So NAMED==2 on z <- 1 is basically to prevent you from
>>>>> accidentally "changing the value of 1". If it weren't, then you could
>>>>> get
>>>>> bitten by code like for(i in 1:2) {z <- 1; if(i==1) z[1] <- 2}.
>>>>>
>>>>> If you're assigning the result of a computation, then the object only
>>>>> exists once, so
>>>>> z <- 0+1  gets NAMED==1.
>>>>>
>>>>> However, if the computation is done by returning a named value from
>>>>> within
>>>>> a function, as in
>>>>>
>>>>>> f <- function(){v <- 1+0; v}
>>>>>> z <- f()
>>>>>
>>>>> then again NAMED==2. This is because the side effects of the function
>>>>> _might_ result in something having a hold on the function
>>>>> environment,
>>>>> e.g. if we had
>>>>>
>>>>> e <- NULL
>>>>> f <- function(){e <<-environment(); v <- 1+0; v}
>>>>> z <- f()
>>>>>
>>>>> then z[1] <- 5 would change e$v too. As it happens, there aren't any
>>>>> side
>>>>> effects in the forme case, but R loses track and assumes the worst.
>>>>>
>>>>
>>>> Thanks a lot, think I follow. That explains x vs y, but why is z
>>>> NAMED==2?
>>>> The result of data.frame() is an object that exists once (similar to
>>>> 1:10)
>>>> so shouldn't it be NAMED==1 too?  Or, R loses track and assumes the
>>>> worst
>>>> even on its own functions such as data.frame()?
>>>
>>> R loses track. I suspect that is really all it can do without actual
>>> reference counting. The function data.frame is more than 150 lines of
>>> code, and if any of those end up invoking user code, possibly via a
>>> class
>>> method, you can't tell definitively whether or not the evaluation
>>> environment dies at the return.
>>
>> Ohhh, think I see now. After Duncan's reply I was going to ask if it was
>> possible to change data.frame() to be primitive so it could set NAMED=1.
>> But it seems primitive functions can't use R code so data.frame() would
>> need to be ported to C. Ok! - not quick or easy, and not without
>> consideable risk. And, data.frame() can invoke user code inside it
>> anyway
>> then.
>>
>> Since list() is primitive I tried to construct a data.frame starting
>> with
>> list() [since structure() isn't primitive], but then merely adding an
>> attribute seems to set NAMED==2 too ?
>>
>
> Yes, because attr(x,y) <- z is the same as
>
> `*tmp*` <- x
> x <- `attr<-`(`*tmp*`, y, z)
> rm(`*tmp*`)
>
> so there are two references to the data frame: one in DF and one in
> `*tmp*`. It is the first line that causes the NAMED bump. And, yes, it's
> real:
>
>> `f<-`=function(x,value) { print(ls(parent.frame())); x<-value }
>> x=1
>> f(x)=1
> [1] "*tmp*" "f<-"   "x"
>
> You could skip that by using the function directly (I don't think it's
> recommended, though):
>
>> .Internal(inspect(l <- list(a=1)))
> @1028c82f8 19 VECSXP g0c1 [NAM(1),ATT] (len=1, tl=0)
>   @1028c8268 14 REALSXP g0c1 [] (len=1, tl=0) 1
> ATTRIB:
>   @100b6e748 02 LISTSXP g0c0 []
>     TAG: @100843878 01 SYMSXP g0c0 [MARK,gp=0x4000] "names"
>     @1028c82c8 16 STRSXP g0c1 [] (len=1, tl=0)
>       @1009cd388 09 CHARSXP g0c1 [MARK,gp=0x21] "a"
>> .Internal(inspect(`names<-`(l, "b")))
> @1028c82f8 19 VECSXP g0c1 [NAM(1),ATT] (len=1, tl=0)
>   @1028c8268 14 REALSXP g0c1 [] (len=1, tl=0) 1
> ATTRIB:
>   @100b6e748 02 LISTSXP g0c0 []
>     TAG: @100843878 01 SYMSXP g0c0 [MARK,gp=0x4000] "names"
>     @1028c8178 16 STRSXP g0c1 [NAM(1)] (len=1, tl=0)
>       @100967af8 09 CHARSXP g0c1 [MARK,gp=0x20] "b"
>> .Internal(inspect(l))
> @1028c82f8 19 VECSXP g0c1 [NAM(1),ATT] (len=1, tl=0)
>   @1028c8268 14 REALSXP g0c1 [] (len=1, tl=0) 1
> ATTRIB:
>   @100b6e748 02 LISTSXP g0c0 []
>     TAG: @100843878 01 SYMSXP g0c0 [MARK,gp=0x4000] "names"
>     @1028c8178 16 STRSXP g0c1 [NAM(1)] (len=1, tl=0)
>       @100967af8 09 CHARSXP g0c1 [MARK,gp=0x20] "b"
>

Interesting, I tried it. I found that setting the "row.names" attribute
that way keeps NAMED==1 ok, and that setting "class" attribute keeps
NAMED==1 ok too. Fantastic! But, it seems that merely printing it on the
console (when the class is set) bumps NAMED to 2. Here is the output :

> DF = list(a=1:3,b=4:6)
> `attr<-`(DF,"row.names",.set_row_names(3))
$a
[1] 1 2 3

$b
[1] 4 5 6

attr(,"row.names")
[1] 1 2 3
> .Internal(inspect(DF))    # great, NAM(1)
@261e730 19 VECSXP g0c1 [NAM(1),ATT] (len=2, tl=0)
  @2abd088 13 INTSXP g0c2 [] (len=3, tl=0) 1,2,3
  @2abd060 13 INTSXP g0c2 [] (len=3, tl=0) 4,5,6
ATTRIB:
  @258d4f4 02 LISTSXP g0c0 []
    TAG: @1612120 01 SYMSXP g0c0 [MARK,gp=0x4000] "names"
    @261e710 16 STRSXP g0c1 [NAM(2)] (len=2, tl=0)
      @17a86f8 09 CHARSXP g0c1 [MARK,gp=0x21] "a"
      @1766868 09 CHARSXP g0c1 [MARK,gp=0x21] "b"
    TAG: @1612040 01 SYMSXP g0c0 [MARK,gp=0x4000] "row.names"
    @261e5d0 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) -2147483648,-3
> .Internal(inspect(`attr<-`(DF,"class","data.frame")))
@261e730 19 VECSXP g0c1 [OBJ,NAM(1),ATT] (len=2, tl=0)  # great, NAM(1)
  @2abd088 13 INTSXP g0c2 [] (len=3, tl=0) 1,2,3
  @2abd060 13 INTSXP g0c2 [] (len=3, tl=0) 4,5,6
ATTRIB:
  @258d4f4 02 LISTSXP g0c0 []
    TAG: @1612120 01 SYMSXP g0c0 [MARK,gp=0x4000] "names"
    @261e710 16 STRSXP g0c1 [NAM(2)] (len=2, tl=0)
      @17a86f8 09 CHARSXP g0c1 [MARK,gp=0x21] "a"
      @1766868 09 CHARSXP g0c1 [MARK,gp=0x21] "b"
    TAG: @1612040 01 SYMSXP g0c0 [MARK,gp=0x4000] "row.names"
    @261e5d0 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) -2147483648,-3
    TAG: @1612388 01 SYMSXP g0c0 [MARK,gp=0x4000] "class"
    @2a758e8 16 STRSXP g0c1 [NAM(1)] (len=1, tl=0)
      @1647f38 09 CHARSXP g0c2 [MARK,gp=0x21,ATT] "data.frame"
> .Internal(inspect(DF))         # Great, NAM(1) still
@261e730 19 VECSXP g0c1 [OBJ,NAM(1),ATT] (len=2, tl=0)
  @2abd088 13 INTSXP g0c2 [] (len=3, tl=0) 1,2,3
  @2abd060 13 INTSXP g0c2 [] (len=3, tl=0) 4,5,6
ATTRIB:
  @258d4f4 02 LISTSXP g0c0 []
    TAG: @1612120 01 SYMSXP g0c0 [MARK,gp=0x4000] "names"
    @261e710 16 STRSXP g0c1 [NAM(2)] (len=2, tl=0)
      @17a86f8 09 CHARSXP g0c1 [MARK,gp=0x21] "a"
      @1766868 09 CHARSXP g0c1 [MARK,gp=0x21] "b"
    TAG: @1612040 01 SYMSXP g0c0 [MARK,gp=0x4000] "row.names"
    @261e5d0 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) -2147483648,-3
    TAG: @1612388 01 SYMSXP g0c0 [MARK,gp=0x4000] "class"
    @2a758e8 16 STRSXP g0c1 [NAM(1)] (len=1, tl=0)
      @1647f38 09 CHARSXP g0c2 [MARK,gp=0x21,ATT] "data.frame"
> DF
  a b
1 1 4
2 2 5
3 3 6
> .Internal(inspect(DF))  # just looking at it changes NAMED to 2 ?
@261e730 19 VECSXP g0c1 [OBJ,MARK,NAM(2),ATT] (len=2, tl=0)
  @2abd088 13 INTSXP g0c2 [MARK,NAM(2)] (len=3, tl=0) 1,2,3
  @2abd060 13 INTSXP g0c2 [MARK,NAM(2)] (len=3, tl=0) 4,5,6
ATTRIB:
  @258d4f4 02 LISTSXP g0c0 [MARK]
    TAG: @1612120 01 SYMSXP g0c0 [MARK,gp=0x4000] "names"
    @261e710 16 STRSXP g0c1 [MARK,NAM(2)] (len=2, tl=0)
      @17a86f8 09 CHARSXP g0c1 [MARK,gp=0x21] "a"
      @1766868 09 CHARSXP g0c1 [MARK,gp=0x21] "b"
    TAG: @1612040 01 SYMSXP g0c0 [MARK,gp=0x4000] "row.names"
    @261e5d0 13 INTSXP g0c1 [MARK,NAM(2)] (len=2, tl=0) -2147483648,-3
    TAG: @1612388 01 SYMSXP g0c0 [MARK,gp=0x4000] "class"
    @2a758e8 16 STRSXP g0c1 [MARK,NAM(2)] (len=1, tl=0)
      @1647f38 09 CHARSXP g0c2 [MARK,gp=0x21,ATT] "data.frame"

> identical(DF, data.frame(a=1:3,b=4:6))
[1] TRUE

Matthew



More information about the R-devel mailing list