[Rd] infelicity in `na.print = ""` for numeric columns of data frames/formatting numeric values

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Mon Jun 5 15:27:16 CEST 2023


>>>>> Ben Bolker 
>>>>>     on Sat, 3 Jun 2023 13:06:41 -0400 writes:

    > format(c(1:2, NA)) gives the last value as "NA" rather than 
    > preserving it as NA, even if na.encode = FALSE (which does the 
    > 'expected' thing for character vectors, but not numeric vectors).

    > This was already brought up in 2008 in 
    > https://bugs.r-project.org/show_bug.cgi?id=12318 where Gregor Gorjanc 
    > pointed out the issue. Documentation was added and the bug closed as 
    > invalid. GG ended with:

    >> IMHO it would be better that na.encode argument would also have an
    > effect for numeric like vectors. Nearly any function in R returns NA 
    > values and I expected the same for format, at least when na.encode=FALSE.

    > I agree!

I do too, at least "in principle", keeping in mind that
backward compatibility is also an important principle ...

Not sure if the 'na.encode' argument should matter or possibly a
new optional argument, but "in principle" I think that

  format(c(1:2, NA, 4))

should preserve is.na(.) even by default.

    > I encountered this in the context of printing a data frame with 
    > na.print = "", which works as expected when printing the individual 
    > columns but not when printing the whole data frame (because 
    > print.data.frame calls format.data.frame, which calls format.default 
    > ...).  Example below.

    > It's also different from what you would get if you converted to 
    > character before formatting and printing:

    > print(format(as.character(c(1:2, NA)), na.encode=FALSE), na.print ="")

    > Everything about this is documented (if you look carefully enough), 
    > but IMO it violates the principle of least surprise 
    > https://en.wikipedia.org/wiki/Principle_of_least_astonishment , so I 
    > would call it at least an 'infelicity' (sensu Bill Venables)

    > Is there any chance that this design decision could be revisited?

We'd have to hear other opinions / gut feelings.

Also, someone (not me) would ideally volunteer to run
'R CMD check <pkg>' for a few 1000 (not necessarily all) CRAN &
BioC packages with an accordingly patched version of R-devel
(I might volunteer to create such a branch, e.g., a bit before the R
 Sprint 2023 end of August).


    > cheers
    > Ben Bolker


    > ---

The following issue you are raising
may really be a *different* one, as it involves format() and
print() methods for "data.frame", i.e.,

   format.data.frame() vs
    print.data.frame()

which is quite a bit related, of course, to how 'numeric'
columns are formatted -- as you note yourself below;
I vaguely recall that the data.frame method could be an even
"harder problem" .. but I don't remember the details.

It may also be that there are no changes necessary to the
*.data.frame() methods, and only the documentation (you mention)
should be updated ...

Martin

    > Consider

    > dd <- data.frame(f = factor(1:2), c = as.character(1:2), n = 
    > as.numeric(1:2), i = 1:2)
    > dd[3,] <- rep(NA, 4)
    > print(dd, na.print = "")


    > print(dd, na.print = "")
    >   f c  n  i
    > 1 1 1  1  1
    > 2 2 2  2  2
    > 3     NA NA

    > This is in fact as documented (see below), but seems suboptimal given 
    > that printing the columns separately with na.print = "" would 
    > successfully print the NA entries as blank even in the numeric columns:

    > invisible(lapply(dd, print, na.print = ""))
    > [1] 1 2
    > Levels: 1 2
    > [1] "1" "2"
    > [1] 1 2
    > [1] 1 2

    > * ?print.data.frame documents that it calls format() for each column 
    > before printing
    > * the code of print.data.frame() shows that it calls format.data.frame() 
    > with na.encode = FALSE
    > * ?format.data.frame specifically notes that na.encode "only applies to 
    > elements of character vectors, not to numerical, complex nor logical 
    > ‘NA’s, which are always encoded as ‘"NA"’.

    > So the NA values in the numeric columns become "NA" rather than 
    > remaining as NA values, and are thus printed rather than being affected 
    > by the na.print argument.

    > ______________________________________________
    > R-devel using r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list