[Rd] Definition of [[

Sun Mar 15 22:30:07 CET 2009

Duncan,

Thanks for the reply.

On Sun, Mar 15, 2009 at 4:43 PM, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
> On 15/03/2009 2:31 PM, Stavros Macrakis wrote:

>> dput(ll[3])
>> list(NULL)
>> ? i is positive and exceeds length(x); why isn't this list(NA)?
>
> Because the sentence you read was talking about "simple vectors", and ll is
> presumably not a simple vector.  So what is a simple vector?  That is not
> explicitly defined, and it probably should be.  I think it is "atomic
> vectors, except those with a class that has a method for [".

The three subsections of 3.4 Indexing are 3.4.1 Indexing by vectors,
3.4.2 Indexing matrices and arrays, 3.4.3 Indexing other structures,
and 3.4.4 Subset assignment, so the context seems to be saying that
"simple vectors" are those which are not matrices or arrays, and those
("other structures") which do not overload [.

Even if the definition of 'simple vector' were clarified to cover only
atomic vectors, I still can't find any text specifying that list(3)[5]
=> lsit(NULL).

For that matter, it would leave the subscripting of important
built-ins such as factors and dates, etc. undefined. Obviously the
intuition is that vectors of factors or vectors of dates would do the
'same thing' as vectors of integers or of strings, but 3.4.3 doesn't
say what that thing is....

>>> ll[[3]]
>>
>> Error in list(1)[[3]] : subscript out of bounds
>> ? Why does this return NA for an atomic vector, but give an error for
>> a generic vector?
>>
>>> cc[[3]] <- 34; dput(cc)
>>
>> c(1, NA, 34)
>> OK
>>
>> ll[[3]] <- 34; dput(ll)
>> list(1, NULL, 34)
>> Why is second element NULL, not NA?
>
> NA is a length 1 atomic vector with a specific type matching the type of c.
>  It makes more sense in this context to put in a NULL, and return a
> list(NULL) for ll[3].

Understood that that's the rationale, but where is it documented?

Also, if that's the rationale, it seems to say that NULL is the
equivalent of NA for list elements, but in fact NULL does not function
like NA:

> is.na(NULL)
logical(0)
Warning message:
In is.na(NULL) : is.na() applied to non-(list or vector) of type 'NULL'
> is.na(list(NULL))
[1] FALSE

Indeed, NA seems to both up-convert and down-convert nicely to other
forms of NA:

> dput(as.integer(as.logical(c(TRUE,NA,TRUE))))
c(1L, NA, 1L)
> dput(as.logical(as.integer(c(TRUE,NA,TRUE))))
c(TRUE, NA, TRUE)

and are not converted to NULL when converted to generic vector:

> dput(as.list(c(TRUE,NA,TRUE)))
list(TRUE, NA, TRUE)

and NA is preserved when downconverting:

> dput(as.logical(as.list(c(TRUE,NA,23))))
c(TRUE, NA, TRUE)

But if you try to downconvert NULL, you get an error

> dput(as.integer(list(NULL)))
Error in isS4(x) : (list) object cannot be coerced to type 'integer'

So I don't see why NULL is the right way to represent NA, especially
since NULL is a perfectly good list element, distinct from NA.

>> And why is it OK to set an undefined ll[[3]], but not to get it?
>
> Lots of code grows vectors by setting elements beyond the end of them, so
> whether or not that's a good idea, it's not likely to change.

I wasn't suggesting changing this.

> I think an argument could be made that ll[[toobig]] should return NULL
> rather than trigger an error, but on the other hand, the current behaviour
> allows the programmer to choose:  if you are assuming that a particular
> element exists, use ll[[element]], and R will tell you when your assumption
> is wrong.  If you aren't sure, use ll[element] and you'll get NA or
> list(NULL) if the element isn't there.

Yes, that could make sense, but why would it be true for ll[[toobig]]
but not cc[[toobig]]?

>> I assume that these are features, not bugs, but I can't find
>> documentation for them.

> There is more documentation in the man page for Extract, but I think it is
> incomplete.

Yes, I was looking at that man page, and I don't think it resolves any
of the above questions.

> The most complete documentation is of course the source code,
> but it may not answer the question of what's intentional and what's
> accidental.

Well, that's one issue.  But another is that there should be a
specification addressed to users, who should not have to understand
internals.

             -s