[Rd] sum(), min(), max(), prod() vs. named arguments in ...

Ivan Krylov kry|ov@r00t @end|ng |rom gm@||@com
Fri Apr 14 12:38:00 CEST 2023


Hello R-devel,

As mentioned on Fosstodon [1] and discussed during RCOH [2], named
arguments to sum(), min(), max() and prod() are currently processed the
same way as the un-named ones. Additionally, when specifying the na.rm
argument more than once, the last specified value is silently taken
into account.

Most of this behaviour is exactly as documented, but it's arguably a
source of mistakes when the na.rm=TRUE argument is mistyped and
converted to 1.

Internally, all these calls land in do_summary(), and the na.rm
argument is processed by fixup_NaRm(). mean() is safe because it's
dispatched via S3 lookup.

Do users normally name their arguments to sum()? If not, it could be
relatively safe to make it a warning to pass named arguments to sum()
that are not na.rm and later transition it to an error:

--- src/main/summary.c	(revision 84252)
+++ src/main/summary.c	(working copy)
@@ -419,6 +419,8 @@
 	    na_value = CAR(a);
 	    if(prev == R_NilValue) args = CDR(a);
 	    else SETCDR(prev, CDR(a));
+	} else if (TAG(a) && TAG(a) != R_NilValue) {
+	    warning("Named argument \"%s\" here is probably a mistake", CHAR(PRINTNAME(TAG(a))));
 	}
 	prev = a;
     }

(Can TAG(a) ever be NULL or anything other than R_NilValue or a symbol
here? We'll probably need to translate this message, too.)

Passing na.rm more than once could be made into an error right away:

--- src/main/summary.c	(revision 84252)
+++ src/main/summary.c	(working copy)
@@ -409,6 +409,7 @@
 attribute_hidden
 SEXP fixup_NaRm(SEXP args)
 {
+    Rboolean seen_NaRm = FALSE;
     SEXP t, na_value;
 
     /* Need to make sure na.rm is last and exists */
@@ -415,7 +416,9 @@
     na_value = ScalarLogical(FALSE);
     for(SEXP a = args, prev = R_NilValue; a != R_NilValue; a = CDR(a)) {
 	if(TAG(a) == R_NaRmSymbol) {
+	    if(seen_NaRm) error("Please specify na.rm only once");
+	    seen_NaRm = TRUE;
 	    if(CDR(a) == R_NilValue) return args;
 	    na_value = CAR(a);
 	    if(prev == R_NilValue) args = CDR(a);
 	    else SETCDR(prev, CDR(a));

(Can we use error(_("formal argument \"%s\" matched by multiple actual
arguments"), "na.rm") to emit an already translated error message?)

Additionally, is it desirable to have S3 methods for sum() and its
friends? Currently, they are dispatched, but something confusing
happens to the arguments: according to the method, the dots contain
what I would expect them to contain (only the numbers to sum), but when
calling NextMethod() to compute the actual sum, extra arguments get
processed:

sum.foo <- function(..., ExtraArg = 999, na.rm = FALSE) {
 print(ExtraArg)
 str(...)
 NextMethod(generic = 'sum', ..., na.rm = na.rm)
}
sum(structure(100, class = 'foo'))
# [1] 999
#  'foo' num 100
# [1] 100
sum(structure(100, class = 'foo'), ExtraArg = -1)
# [1] -1
#  'foo' num 100 # <-- ExtraArg = -1 missing from ...
# [1] 99         # <-- but still subtracted

Maybe I'm just misusing NextMethod(). This is probably not important in
practice due to how S3 dispatch works.

-- 
Best regards,
Ivan

[1]
https://fosstodon.org/@nibsalot/110105034507818285

[2]
https://github.com/r-devel/rcontribution/blob/main/office_hours/2023-04-13_EMEA-APAC.md



More information about the R-devel mailing list