[Rd] infelicity in `na.print = ""` for numeric columns of data frames/formatting numeric values

Ben Bolker bbo|ker @end|ng |rom gm@||@com
Sat Jun 3 19:06:41 CEST 2023


   format(c(1:2, NA)) gives the last value as "NA" rather than 
preserving it as NA, even if na.encode = FALSE (which does the 
'expected' thing for character vectors, but not numeric vectors).

   This was already brought up in 2008 in 
https://bugs.r-project.org/show_bug.cgi?id=12318 where Gregor Gorjanc 
pointed out the issue. Documentation was added and the bug closed as 
invalid. GG ended with:

 > IMHO it would be better that na.encode argument would also have an
effect for numeric like vectors. Nearly any function in R returns NA 
values and I expected the same for format, at least when na.encode=FALSE.

   I agree!

   I encountered this in the context of printing a data frame with 
na.print = "", which works as expected when printing the individual 
columns but not when printing the whole data frame (because 
print.data.frame calls format.data.frame, which calls format.default 
...).  Example below.

   It's also different from what you would get if you converted to 
character before formatting and printing:

print(format(as.character(c(1:2, NA)), na.encode=FALSE), na.print ="")

   Everything about this is documented (if you look carefully enough), 
but IMO it violates the principle of least surprise 
https://en.wikipedia.org/wiki/Principle_of_least_astonishment , so I 
would call it at least an 'infelicity' (sensu Bill Venables)

   Is there any chance that this design decision could be revisited?

   cheers
    Ben Bolker


---

   Consider

dd <- data.frame(f = factor(1:2), c = as.character(1:2), n = 
as.numeric(1:2), i = 1:2)
dd[3,] <- rep(NA, 4)
print(dd, na.print = "")


print(dd, na.print = "")
   f c  n  i
1 1 1  1  1
2 2 2  2  2
3     NA NA

This is in fact as documented (see below), but seems suboptimal given 
that printing the columns separately with na.print = "" would 
successfully print the NA entries as blank even in the numeric columns:

invisible(lapply(dd, print, na.print = ""))
[1] 1 2
Levels: 1 2
[1] "1" "2"
[1] 1 2
[1] 1 2

* ?print.data.frame documents that it calls format() for each column 
before printing
* the code of print.data.frame() shows that it calls format.data.frame() 
with na.encode = FALSE
* ?format.data.frame specifically notes that na.encode "only applies to 
elements of character vectors, not to numerical, complex nor logical 
‘NA’s, which are always encoded as ‘"NA"’.

    So the NA values in the numeric columns become "NA" rather than 
remaining as NA values, and are thus printed rather than being affected 
by the na.print argument.



More information about the R-devel mailing list