[Rd] Definition of [[

Duncan Murdoch murdoch at stats.uwo.ca
Mon Mar 16 00:46:45 CET 2009


Just a couple of inline comments down below:

On 15/03/2009 5:30 PM, Stavros Macrakis wrote:
> Duncan,
> 
> Thanks for the reply.
> 
> On Sun, Mar 15, 2009 at 4:43 PM, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
>> On 15/03/2009 2:31 PM, Stavros Macrakis wrote:
> 
>>> dput(ll[3])
>>> list(NULL)
>>> ? i is positive and exceeds length(x); why isn't this list(NA)?
>> Because the sentence you read was talking about "simple vectors", and ll is
>> presumably not a simple vector.  So what is a simple vector?  That is not
>> explicitly defined, and it probably should be.  I think it is "atomic
>> vectors, except those with a class that has a method for [".
> 
> The three subsections of 3.4 Indexing are 3.4.1 Indexing by vectors,
> 3.4.2 Indexing matrices and arrays, 3.4.3 Indexing other structures,
> and 3.4.4 Subset assignment, so the context seems to be saying that
> "simple vectors" are those which are not matrices or arrays, and those
> ("other structures") which do not overload [.
> 
> Even if the definition of 'simple vector' were clarified to cover only
> atomic vectors, I still can't find any text specifying that list(3)[5]
> => lsit(NULL).
> 
> For that matter, it would leave the subscripting of important
> built-ins such as factors and dates, etc. undefined. Obviously the
> intuition is that vectors of factors or vectors of dates would do the
> 'same thing' as vectors of integers or of strings, but 3.4.3 doesn't
> say what that thing is....
> 
>>>> ll[[3]]
>>> Error in list(1)[[3]] : subscript out of bounds
>>> ? Why does this return NA for an atomic vector, but give an error for
>>> a generic vector?
>>>
>>>> cc[[3]] <- 34; dput(cc)
>>> c(1, NA, 34)
>>> OK
>>>
>>> ll[[3]] <- 34; dput(ll)
>>> list(1, NULL, 34)
>>> Why is second element NULL, not NA?
>> NA is a length 1 atomic vector with a specific type matching the type of c.
>>  It makes more sense in this context to put in a NULL, and return a
>> list(NULL) for ll[3].
> 
> Understood that that's the rationale, but where is it documented?
> 
> Also, if that's the rationale, it seems to say that NULL is the
> equivalent of NA for list elements, but in fact NULL does not function
> like NA:
> 
>> is.na(NULL)
> logical(0)
> Warning message:
> In is.na(NULL) : is.na() applied to non-(list or vector) of type 'NULL'
>> is.na(list(NULL))
> [1] FALSE
> 
> Indeed, NA seems to both up-convert and down-convert nicely to other
> forms of NA:
> 
>> dput(as.integer(as.logical(c(TRUE,NA,TRUE))))
> c(1L, NA, 1L)
>> dput(as.logical(as.integer(c(TRUE,NA,TRUE))))
> c(TRUE, NA, TRUE)
> 
> and are not converted to NULL when converted to generic vector:
> 
>> dput(as.list(c(TRUE,NA,TRUE)))
> list(TRUE, NA, TRUE)
> 
> and NA is preserved when downconverting:
> 
>> dput(as.logical(as.list(c(TRUE,NA,23))))
> c(TRUE, NA, TRUE)
> 
> But if you try to downconvert NULL, you get an error
> 
>> dput(as.integer(list(NULL)))
> Error in isS4(x) : (list) object cannot be coerced to type 'integer'
> 
> So I don't see why NULL is the right way to represent NA, especially
> since NULL is a perfectly good list element, distinct from NA.
> 
>>> And why is it OK to set an undefined ll[[3]], but not to get it?
>> Lots of code grows vectors by setting elements beyond the end of them, so
>> whether or not that's a good idea, it's not likely to change.
> 
> I wasn't suggesting changing this.
> 
>> I think an argument could be made that ll[[toobig]] should return NULL
>> rather than trigger an error, but on the other hand, the current behaviour
>> allows the programmer to choose:  if you are assuming that a particular
>> element exists, use ll[[element]], and R will tell you when your assumption
>> is wrong.  If you aren't sure, use ll[element] and you'll get NA or
>> list(NULL) if the element isn't there.
> 
> Yes, that could make sense, but why would it be true for ll[[toobig]]
> but not cc[[toobig]]?

But it is:

 > cc <- c(1)
 > cc[[3]]
Error in cc[[3]] : subscript out of bounds

>>> I assume that these are features, not bugs, but I can't find
>>> documentation for them.
> 
>> There is more documentation in the man page for Extract, but I think it is
>> incomplete.
> 
> Yes, I was looking at that man page, and I don't think it resolves any
> of the above questions.
> 
>> The most complete documentation is of course the source code,
>> but it may not answer the question of what's intentional and what's
>> accidental.
> 
> Well, that's one issue.  But another is that there should be a
> specification addressed to users, who should not have to understand
> internals.

I agree, but not so strongly that I will drop everything and write one.

Duncan Murdoch



More information about the R-devel mailing list