[Rd] infelicity in `na.print = ""` for numeric columns of data frames/formatting numeric values
Martin Maechler
m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Mon Jun 5 15:27:16 CEST 2023
>>>>> Ben Bolker
>>>>> on Sat, 3 Jun 2023 13:06:41 -0400 writes:
> format(c(1:2, NA)) gives the last value as "NA" rather than
> preserving it as NA, even if na.encode = FALSE (which does the
> 'expected' thing for character vectors, but not numeric vectors).
> This was already brought up in 2008 in
> https://bugs.r-project.org/show_bug.cgi?id=12318 where Gregor Gorjanc
> pointed out the issue. Documentation was added and the bug closed as
> invalid. GG ended with:
>> IMHO it would be better that na.encode argument would also have an
> effect for numeric like vectors. Nearly any function in R returns NA
> values and I expected the same for format, at least when na.encode=FALSE.
> I agree!
I do too, at least "in principle", keeping in mind that
backward compatibility is also an important principle ...
Not sure if the 'na.encode' argument should matter or possibly a
new optional argument, but "in principle" I think that
format(c(1:2, NA, 4))
should preserve is.na(.) even by default.
> I encountered this in the context of printing a data frame with
> na.print = "", which works as expected when printing the individual
> columns but not when printing the whole data frame (because
> print.data.frame calls format.data.frame, which calls format.default
> ...). Example below.
> It's also different from what you would get if you converted to
> character before formatting and printing:
> print(format(as.character(c(1:2, NA)), na.encode=FALSE), na.print ="")
> Everything about this is documented (if you look carefully enough),
> but IMO it violates the principle of least surprise
> https://en.wikipedia.org/wiki/Principle_of_least_astonishment , so I
> would call it at least an 'infelicity' (sensu Bill Venables)
> Is there any chance that this design decision could be revisited?
We'd have to hear other opinions / gut feelings.
Also, someone (not me) would ideally volunteer to run
'R CMD check <pkg>' for a few 1000 (not necessarily all) CRAN &
BioC packages with an accordingly patched version of R-devel
(I might volunteer to create such a branch, e.g., a bit before the R
Sprint 2023 end of August).
> cheers
> Ben Bolker
> ---
The following issue you are raising
may really be a *different* one, as it involves format() and
print() methods for "data.frame", i.e.,
format.data.frame() vs
print.data.frame()
which is quite a bit related, of course, to how 'numeric'
columns are formatted -- as you note yourself below;
I vaguely recall that the data.frame method could be an even
"harder problem" .. but I don't remember the details.
It may also be that there are no changes necessary to the
*.data.frame() methods, and only the documentation (you mention)
should be updated ...
Martin
> Consider
> dd <- data.frame(f = factor(1:2), c = as.character(1:2), n =
> as.numeric(1:2), i = 1:2)
> dd[3,] <- rep(NA, 4)
> print(dd, na.print = "")
> print(dd, na.print = "")
> f c n i
> 1 1 1 1 1
> 2 2 2 2 2
> 3 NA NA
> This is in fact as documented (see below), but seems suboptimal given
> that printing the columns separately with na.print = "" would
> successfully print the NA entries as blank even in the numeric columns:
> invisible(lapply(dd, print, na.print = ""))
> [1] 1 2
> Levels: 1 2
> [1] "1" "2"
> [1] 1 2
> [1] 1 2
> * ?print.data.frame documents that it calls format() for each column
> before printing
> * the code of print.data.frame() shows that it calls format.data.frame()
> with na.encode = FALSE
> * ?format.data.frame specifically notes that na.encode "only applies to
> elements of character vectors, not to numerical, complex nor logical
> βNAβs, which are always encoded as β"NA"β.
> So the NA values in the numeric columns become "NA" rather than
> remaining as NA values, and are thus printed rather than being affected
> by the na.print argument.
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list