[Rd] Development version of R: Improved nchar(), nzchar() but changed API
Martin Maechler
maechler at lynne.stat.math.ethz.ch
Fri Apr 24 12:06:23 CEST 2015
Those of you who track R development closely,
will have noticed yesterday's commit of enhanced versions of
nchar() and nzchar().
------------------------------------------------------------------------
r68254 | maechler | 2015-04-23 18:06:37 +0200 (Thu, 23 Apr 2015) | 1 line
Changed paths:
M doc/NEWS.Rd
M src/library/base/R/New-Internal.R
M src/library/base/R/zzz.R
M src/library/base/man/nchar.Rd
M src/main/character.c
M src/main/names.c
M tests/reg-tests-1a.R
nchar(x) now gives NA for character NAs, configurably via nchar(x, keepNA=*); analogously for nzchar()
------------------------------------------------------------------------
Enhanced via the new argument 'keepNA' (a logical, i.e., TRUE/FALSE/NA),
but also *not* backward compatible in the current
implementation. Here's how it works [currently], showing the (input and output
of the slightly abridged) example(nchar):
> x <- c("asfef", "qwerty", "yuiop[", "b", "stuff.blah.yech")
> x[3] <- NA; x
[1] "asfef" "qwerty" NA "b"
[5] "stuff.blah.yech"
> nchar(x, keepNA= TRUE) # 5 6 NA 1 15
[1] 5 6 NA 1 15
> nchar(x, keepNA=FALSE) # 5 6 2 1 15
[1] 5 6 2 1 15
> stopifnot(identical(nchar(x ), nchar(x, keepNA= TRUE)),
identical(nchar(x, "w"), nchar(x, keepNA=FALSE)))
>
The main reason for the change: it is more logical
that NA_character_ in x are transformed to NA_integer_ in the result,
which is what happens with 'keepNA = TRUE', which can be
translated as "keep/preserve the NA's that were in x (the main argument)".
If you use nchar(x, type = "words"), or its short form nchar(x, "w")
you implicitly ask for 'keepNA = FALSE',
because "words" is about output / formatting / etc, and there,
you'd typically want
nchar(c("ABC", NA), "words")
to give 3 2 -- which is what happens unconditionally in R <= 3.2.0.
We've found quite a few CRAN packages to "break" (R CMD check)
for R-devel r68254, because I had clearly underestimated the
number of places where current R code was built on assuming the
"pre-R-devel" (aka "current R") semantics of nchar() and
nzchar() which for R <= 3.2.0 say
Value:
For ‘nchar’, an integer vector giving the sizes of each element,
__currently__ always ‘2’ for missing values (for ‘NA’).
(my emphasis added to "currently").
As package authors, when using R-devel you may wait a day when
you see problems with R-devel (that you don't see with R 3.2.0),
but you should become aware of the slightly changed semantics of
nchar() and nzchar().
Longer term, the change should have made R more "internally coherent",
namely vectorized R functions preserving NA's by default.
Martin Maechler,
ETH Zurich
More information about the R-devel
mailing list