[R] Inconsistent handling of character NA?

Mon Jul 21 20:29:50 CEST 2003

On Mon, 21 Jul 2003, Raubertas, Richard wrote:

> [R 1.7.1 on Windows XP Pro]
>
> Since R allows missing values for character variables, why
> are NA's not propagated by character manipulation functions?

They are in the development version.

> For example:
>
> > temp <- c("a", NA)
> > temp
> [1] "a" NA
> > is.na(temp)
> [1] FALSE  TRUE
> > paste(temp[1], temp[2])
> [1] "a NA"
> > substr(temp, 1, 1)
> [1] "a" "N"
> > sub("[aA]","b", temp)
> [1] "b"  "Nb"
>
> It seems to me that paste(temp[1], temp[2]) should return
> a single character NA: it asks to concatenate "a" with
> some unknown string, so the result should be unknown as
> well.  This is certainly how numeric NA's are handled.

paste() does not do what you want even in the development version. The
reason is that paste should be able to produce a printable string from NA
input.  This is certainly how numeric NA's are handled by paste(). You
could make a case that this behaviour is what deparse() is for, but I
don't think it's going to be successful.

Here's the regression tests from the development version so you can see
how the new stuff works.
a <- c("NA", NA, "BANANA")
na <- as.character(NA)
a1 <- substr(a,1,1)
stopifnot(is.na(a1)==is.na(a))
a2 <- substring(a,1,1)
stopifnot(is.na(a2)==is.na(a))
a3 <- sub("NA","na",a)
stopifnot(is.na(a3)==is.na(a))
a3 <- gsub("NA","na",a)
stopifnot(is.na(a3)==is.na(a))
substr(a3, 1, 2) <- "na"
stopifnot(is.na(a3)==is.na(a))
substr(a3, 1, 2) <- na
stopifnot(all(is.na(a3)))
stopifnot(agrep("NA", a) == c(1, 3))
stopifnot(grep("NA", a) == c(1, 3))
stopifnot(grep("NA", a, perl=TRUE) == c(1, 3))
stopifnot(agrep(na, a) == 2)
stopifnot(grep(na, a) == 2)
stopifnot(grep(na, a, perl=TRUE) == 2)
a4 <- abbreviate(a)
stopifnot(is.na(a4) == is.na(a))
a5 <- chartr("NA", "na", a)
stopifnot(is.na(a5) == is.na(a))
a6 <- gsub(na, "na", a)
stopifnot(all(!is.na(a6)))
a7 <- a; substr(a7, 1, 2) <- "na"
stopifnot(is.na(a7) == is.na(a))
a8 <- a; substr(a8, 1, 2) <- na
stopifnot(all(is.na(a8)))
stopifnot(identical(a, toupper(tolower(a))))
a9<-strsplit(a, "NA")
stopifnot(identical(a9, list("",na,c("BA",""))))
a10<-strsplit(a, na)
stopifnot(identical(a10, as.list(a)))
## but nchar doesn't fit this pattern
stopifnot(all(!is.na(nchar(a))))

	-thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle