[Rd] sweep sanity checking?
Petr Savicky
savicky at cs.cas.cz
Wed Jul 25 09:03:21 CEST 2007
I would like to suggest a patch against R-devel-2007-07-24, which
modifies function sweep by including a warning, if dim(STATS) is not
consistent with dim(x)[MARGIN]. If check.margin=FALSE, the simple test whether
prod(dim(x)[MARGIN]) is a multiple of length(STATS) is performed.
If check.margin=TRUE, then a more restrictive test is used, but a limited
recycling is still allowed without warning. Besides generating a warning
in some situations, there is no other change in the behavior of sweep.
The patch is:
--- R-devel_2007-07-24/src/library/base/R/sweep.R 2007-01-04 17:51:53.000000000 +0100
+++ R-devel_2007-07-24-sweep/src/library/base/R/sweep.R 2007-07-24 10:56:18.000000000 +0200
@@ -1,7 +1,21 @@
-sweep <- function(x, MARGIN, STATS, FUN = "-", ...)
+sweep <- function(x, MARGIN, STATS, FUN = "-", check.margin=FALSE, ...)
{
FUN <- match.fun(FUN)
dims <- dim(x)
+ if (check.margin) {
+ dimmargin <- dims[MARGIN]
+ dimmargin <- dimmargin[dimmargin > 1]
+ dimstats <- if (is.null(dim(STATS))) length(STATS) else dim(STATS)
+ dimstats <- dimstats[dimstats > 1]
+ if (length(dimstats) > length(dimmargin) ||
+ any(dimstats != dimmargin[seq(along.with=dimstats)]))
+ warning("length(STATS) or dim(STATS) do not match dim(x)[MARGIN]")
+ } else if (prod(dims[MARGIN]) %% length(STATS) != 0) {
+ if (length(MARGIN) == 1)
+ warning("dim(x)[MARGIN] is not a multiple of length(STATS)")
+ else
+ warning("prod(dim(x)[MARGIN]) is not a multiple of length(STATS)")
+ }
perm <- c(MARGIN, (1:length(dims))[ - MARGIN])
FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...)
}
The patch uses the default check.margin=FALSE, since this is more backward
compatible. Changing the default to check.margin=TRUE would also be fine
with me and also with Ben Bolker, who told me this in a separate email.
Let me include more comments on the stricter test. If check.margin=TRUE,
then the patch tests whether (after deleting possible dimensions with
only one level) dim(STATS) is a prefix of dim(x)[MARGIN]. Hence, for example,
if dim(x)[MARGIN] = c(k1,k2), the cases
length(STATS) = 1,
dim(STATS) = k1,
dim(STATS) = NULL and length(STATS) = k1,
dim(STATS) = c(k1,k2)
are accepted without warning. On the other hand, if k1 != k2, then,
for example, dim(STATS)= k2, dim(STATS) = c(k2,k1) generate
a warning, although the simple divisibility condition
length(STATS) divides prod(dim(x)[MARGIN])
is satisfied. The warning is generated, since in the last two cases,
recycling produces incorrect or at least suspicious result.
In the simplest case, when length(MARGIN)=1 and STATS is a vector,
the cases accepted by the stricter test without warning are exactly the
following two: length(STATS) = 1, length(STATS) = dim(x)[MARGIN].
I tested the patch using the script
http://www.cs.cas.cz/~savicky/R-devel/verify_sweep1.R
Ben Bolker also tested the patch in his environment.
I appreciate to know the opinion of R core developers on this patch.
Thank you in advance.
Petr.
More information about the R-devel
mailing list