[Rd] Characters subsetted with NA (was: Several R vs S-Plus issues)

David Brahm brahm@alum.mit.edu
Thu, 4 Oct 2001 11:19:25 -0400 (EDT)


Hello, R-devel!

I posted to R-help, and (inappropriately) to R-bugs, this R/S-Plus difference:
> LETTERS[c(NA,2)] in S is c("","B"), but in R is c("NA","B")

Kurt Hornik <Kurt.Hornik@ci.tuwien.ac.at> wrote:
> I think we do not want to change this. ...
> R> is.na(LETTERS[c(NA,2)])
  [1]  TRUE FALSE
> so we really have NA but it is printed as "NA" (and this might be
> another case where <NA> would be better).

This is merely a statement about is.na(), namely that it flags the string
"NA".  There is no distinction in R between the "NA" that arises from
subsetting with a missing value, and the "NA" that represents Nabisco.  Note:
  R> all.equal(LETTERS[c(NA,2)][1], substring("NABISCO",1,2))
  [1] TRUE

R thinks I am missing data when I trade Nabisco, in the Netherlands/Antilles,
or through trader Neil Armstrong.  In each case, "" would more likely indicate
a missing value; there is no stock with that ticker, country with that symbol,
or trader with those initials.

I'd think it would be easy to make is.na() flag "" instead of "NA".  However,
I'm stuck at the definition:
  R> is.na
  .Primitive("is.na")
so I don't know how to change it (I'm a lousy developer, you see).

I propose again that we provide options(na.char="NA"), so users can make their
own choice.  Thanks for listening!
					-- David Brahm (brahm@alum.mit.edu)
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._