[Rd] infelicity in `na.print = ""` for numeric columns of data frames/formatting numeric values
Ben Bolker
bbo|ker @end|ng |rom gm@||@com
Sat Jun 3 19:06:41 CEST 2023
format(c(1:2, NA)) gives the last value as "NA" rather than
preserving it as NA, even if na.encode = FALSE (which does the
'expected' thing for character vectors, but not numeric vectors).
This was already brought up in 2008 in
https://bugs.r-project.org/show_bug.cgi?id=12318 where Gregor Gorjanc
pointed out the issue. Documentation was added and the bug closed as
invalid. GG ended with:
> IMHO it would be better that na.encode argument would also have an
effect for numeric like vectors. Nearly any function in R returns NA
values and I expected the same for format, at least when na.encode=FALSE.
I agree!
I encountered this in the context of printing a data frame with
na.print = "", which works as expected when printing the individual
columns but not when printing the whole data frame (because
print.data.frame calls format.data.frame, which calls format.default
...). Example below.
It's also different from what you would get if you converted to
character before formatting and printing:
print(format(as.character(c(1:2, NA)), na.encode=FALSE), na.print ="")
Everything about this is documented (if you look carefully enough),
but IMO it violates the principle of least surprise
https://en.wikipedia.org/wiki/Principle_of_least_astonishment , so I
would call it at least an 'infelicity' (sensu Bill Venables)
Is there any chance that this design decision could be revisited?
cheers
Ben Bolker
---
Consider
dd <- data.frame(f = factor(1:2), c = as.character(1:2), n =
as.numeric(1:2), i = 1:2)
dd[3,] <- rep(NA, 4)
print(dd, na.print = "")
print(dd, na.print = "")
f c n i
1 1 1 1 1
2 2 2 2 2
3 NA NA
This is in fact as documented (see below), but seems suboptimal given
that printing the columns separately with na.print = "" would
successfully print the NA entries as blank even in the numeric columns:
invisible(lapply(dd, print, na.print = ""))
[1] 1 2
Levels: 1 2
[1] "1" "2"
[1] 1 2
[1] 1 2
* ?print.data.frame documents that it calls format() for each column
before printing
* the code of print.data.frame() shows that it calls format.data.frame()
with na.encode = FALSE
* ?format.data.frame specifically notes that na.encode "only applies to
elements of character vectors, not to numerical, complex nor logical
‘NA’s, which are always encoded as ‘"NA"’.
So the NA values in the numeric columns become "NA" rather than
remaining as NA values, and are thus printed rather than being affected
by the na.print argument.
More information about the R-devel
mailing list