[Rd] Surprising behavior of letters[c(NA, NA)]

(Ted Harding) ted.harding at wlandres.net
Fri Dec 17 16:40:14 CET 2010


On 17-Dec-10 14:32:18, Gabor Grothendieck wrote:
> Consider this:
> 
>> letters[c(2, 3)]
> [1] "b" "c"
>> letters[c(2, NA)]
> [1] "b" NA
>> letters[c(NA, 3)]
> [1] NA  "c"
>> letters[c(NA, NA)]
>  [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> NA NA NA
> [26] NA
> 
> The result is a 2-vector in each case until we get to c(NA, NA) and
> then it unexpectedly changes from returning a 2-vector to returning a
> 26-vector.  I think most people would have expected that the answer
> would be c(NA, NA).

I'm not sure that it is suprising! Consider
  letters[NA]
which returns exactly the same result. Then consider that 'letters' is
simply a 26-element character vector c("a",...). Now consider

  x <- c(1,2,3,4,5,6,7,8,9,10,11,12,13)
  x[NA]
  # [1] NA NA NA NA NA NA NA NA NA NA NA NA NA

In other words, x[NA] for any vector x will test each index 1:length(x)
against NA, and will find that it's NA, since it doesn't know whether
the index matches or not. Therefore it returns NA for that index, and
will do the same for every index. So it's telling you: "For each of my
elements a,b,c,d,e,f,... I have to tell you that I don't know whether
you want it or not". You also get similar behavior for x==NA.

If anything might be surprising (though that also admits a logical
explanation), is the result

  letters[c(2, NA)]
  # [1] "b" NA

since the result being asked for by the first element of c(2,NA) is
definite -- so far so good -- but then you would expect it to have the
same problem with what is being asked for by NA. This time, it seems
that because the 2-element vector c(2,NA) is being submitted, its
length over-rides the length of the response that would be given for
x[NA]: "You asked for a 2-element extraction from letters; I can see
what you want for the first, but not for the second".

However, that logic does not work for letters[c(NA,NA)] which still
returns the 26-element result!

After all that, I'm inclined to the view that letters[NA] should
return one element (NA), letters[c(NA,NA)] should return 2 (NA,NA),
etc.; and that the same should apply to all vectors accessed by [].
The above behaviour seems to contradict [what I can understand from]
what is said in ?"[":

NAs in indexing:
     When extracting, a numerical, logical or character 'NA' index
     picks an unknown element and so returns 'NA' in the corresponding
     element of a logical, integer, numeric, complex or character
     result, and 'NULL' for a list.  (It returns '00' for a raw
     result.]

since that seems to imply that x[c(NA,NA)] should return c(NA,NA)
and not rep(NA,length(x))!

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at wlandres.net>
Fax-to-email: +44 (0)870 094 0861
Date: 17-Dec-10                                       Time: 15:40:03
------------------------------ XFMail ------------------------------



More information about the R-devel mailing list