[R] Summary of Characters vectors, NA's and "" in merges

David Kane <David Kane a296180 at mica.fmr.com
Fri Sep 28 18:04:17 CEST 2001


Thanks to Brian Ripley, Gregory Warnes, and Dennis Murphy for considering my
problem about "NA" in character strings. The nub of the issue seems to be that
you can not have a string with "NA" in it in a character vector in R without it
being intrepreted as meaning NA (i.e., not available). The only work-arounds
involve renames of various sorts.

Perhaps this is more appropriate for r-devel, but I was wondering what the
future holds for character vectors in R, i.e., will this always be a
problem? Although I am not smart enough to understand the Green Book, there is
a discussion following page 200 that *seems* to suggest that the usage of a
string class may make it easier to deal with this issue.

Is there anything coming down the pike on this point?


Greg suggested:

Perhaps the simplest thing would be to change occurences of "NA" (meaning
Nabisco) to something similar like "NA." before placing the variable in a
dataframe....

> a <- data.frame(x = 1:4)
> y <- c("NA","a","b")
> y[y=="NA"] <- "NA."
> b <- data.frame(x = 1:3, y = y)
> merge(a, b, all.x = TRUE)
  x   y
1 1 NA.
2 2   a
3 3   b
4 4  NA

This isn't very clean, but its simple...

Dennis suggested:

This might be a little naive, but...since R *is* case sensitive, would
"Na" for Nabisco be a workable substitute?

> str <- c("A","B","NA","Na","NA")
> which(is.na(str))
[1] 3 5


Thanks again to all,

Dave
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list