[Rd] Potential bugs in table dnn

SOEIRO Thomas Thom@@@SOEIRO @end|ng |rom @p-hm@|r
Wed Oct 13 13:12:09 CEST 2021


Inline comments below in the previous message

I'm not 100% sure if the current behavior is intended or not. If not, here is a patch (which I can submit on R Bugzilla if appropriate):


diff -u orig/table.R mod/table.R
--- orig/table.R	2021-10-13 10:04:28.560912800 +0200
+++ mod/table.R	2021-10-13 10:43:43.815915100 +0200
@@ -1,7 +1,7 @@
 #  File src/library/base/R/table.R
 #  Part of the R package, https://www.R-project.org
 #
-#  Copyright (C) 1995-2020 The R Core Team
+#  Copyright (C) 1995-2021 The R Core Team
 #
 #  This program is free software; you can redistribute it and/or modify
 #  it under the terms of the GNU General Public License as published by
@@ -50,9 +50,8 @@
     args <- list(...)
     if (length(args) == 1L && is.list(args[[1L]])) { ## e.g. a data.frame
 	args <- args[[1L]]
-	if (length(dnn) != length(args))
-	    dnn <- if (!is.null(argn <- names(args))) argn
-		   else paste(dnn[1L], seq_along(args), sep = ".")
+	dnn <- if (!is.null(argn <- names(args))) argn
+	       else paste(dnn[1L], seq_along(args), sep = ".")
     }
     if (!length(args))
 	stop("nothing to tabulate")
diff -u orig/table.Rd mod/table.Rd
--- orig/table.Rd	2021-10-13 11:39:45.839097000 +0200
+++ mod/table.Rd	2021-10-13 11:56:25.620660900 +0200
@@ -1,6 +1,6 @@
 % File src/library/base/man/table.Rd
 % Part of the R package, https://www.R-project.org
-% Copyright 1995-2021 R Core Team
+% Copyright 1995-2016 R Core Team
 % Distributed under GPL 2 or later
 
 \name{table}
@@ -48,7 +48,7 @@
   \item{useNA}{whether to include \code{NA} values in the table.
     See \sQuote{Details}.  Can be abbreviated.}
   \item{dnn}{the names to be given to the dimensions in the result (the
-    \emph{dimnames names}).}
+    \emph{dimnames names}).  See \sQuote{Details}.}
   \item{deparse.level}{controls how the default \code{dnn} is
     constructed.  See \sQuote{Details}.}
   \item{x}{an arbitrary \R object, or an object inheriting from class
@@ -64,12 +64,15 @@
   \item{sep, base}{passed to \code{\link{provideDimnames}}.}
 }
 \details{
-  If the argument \code{dnn} is not supplied, the internal function
+  If ... is one or more objects which can be interpreted as factors
+  and the argument \code{dnn} is not supplied, the internal function
   \code{list.names} is called to compute the \sQuote{dimname names}.  If the
   arguments in \code{\dots} are named, those names are used.  For the
   remaining arguments, \code{deparse.level = 0} gives an empty name,
   \code{deparse.level = 1} uses the supplied argument if it is a symbol,
-  and \code{deparse.level = 2} will deparse the argument.
+  and \code{deparse.level = 2} will deparse the argument.  Otherwise,
+  if ... is a list (or data frame), its names are used as the
+  \sQuote{dimname names} and the argument \code{dnn} is not used.
 
   Only when \code{exclude} is specified (i.e., not by default) and
   non-empty, will \code{table} potentially drop levels of factor



> Dear list,
> 
> table does not set dnn for dataframes of length 1:
> 
> table(warpbreaks[2:3]) # has dnn
> #     tension
> # wool L M H
> #    A 9 9 9
> #    B 9 9 9
> 
> table(warpbreaks[2]) # has no dnn
> # 
> #  A  B 
> # 27 27 
> 
> This is because of if (length(dnn) != length(args)) (line 53 in https://github.com/wch/r-source/blob/trunk/src/library/base/R/table.R). When commenting this line or modifying it to if (length(dnn) != length(args) || dnn == ""), dnn are set as expected:
> 
> table2(warpbreaks[2:3]) # has dnn
> #     tension
> # wool L M H
> #    A 9 9 9
> #    B 9 9 9
> 
> table2(warpbreaks[2]) # has dnn
> # wool
> #  A  B 
> # 27 27 
> 
> However, I do not get the logic for the initial if (length(dnn) != length(args)), so the change may break something else...

I guess the purpose of this line is to have the possibility to set the dimname names through the dnn argument for lists (or data frames) of length 1, e.g.:

table(warpbreaks[2], dnn = "xxx")
# xxx
#  A  B 
# 27 27

However, this seems inconsistent with the behavior for lists (or data frames) of length >1. Removing the exception introduced by the if clause restore the consistency in dimname names for lists (or data frames) whatever their length.


> In addition, table documentation says "If the argument dnn is not supplied, the internal function list.names is called to compute the 'dimname names'. If the arguments in ... are named, those names are used." Some cases seem inconsistent or may return a warning:

The documentation seems not very clear on how dimname names are computed for lists (or data frames). If removing the if clause [i.e., consistent behavior for lists (or data frames), see above], I think it only requires to document the "precedence" of list (or data frame) names over dnn when ... is a list (or data frame), e.g.: "if ... is a list (or data frame), its names are used as the \sQuote{dimname names} and the argument \code{dnn} is not used." (see the patch above)


> table(warpbreaks[2], dnn = letters) # no warning/not as documented
> # wool
> #  A  B 
> # 27 27 
> 
> table(warpbreaks[2], dnn = letters[1]) # as documented
> # a
> #  A  B 
> # 27 27 
> 
> table(zzz = warpbreaks[2], dnn = letters[1]) # as documented
> # a
> #  A  B 
> # 27 27 
> 
> table(zzz = warpbreaks$wool, dnn = letters[1]) # as documented
> # a
> #  A  B 
> # 27 27 
> 
> table(warpbreaks$wool, dnn = letters) # as expected
> # Error in names(dn) <- dnn : 
> #   attribut 'names' [26] doit être de même longueur que le vecteur [1]
> 
> Best regards,
> 
> Thomas



More information about the R-devel mailing list