[Rd] sweep sanity checking?
Petr Savicky
savicky at cs.cas.cz
Fri Jul 27 09:10:06 CEST 2007
When I was preparing the patch of sweep submitted on July 25, I was
unaware of the code by Heather Turner. She suggested a very elegant
solution, if STATS is a vector and we want to use meaningful recycling
in full generality. I would like to suggest a combined solution, which
uses Heather Turner's algorithm if check.margin=FALSE (default) and STATS
is a vector and my previous algorithm, if check.margin=TRUE or STATS is
an array. The suggestion is
# combined from the original code of sweep without warnings and from
# https://stat.ethz.ch/pipermail/r-help/2005-June/073989.html by Robin Hankin
# https://stat.ethz.ch/pipermail/r-help/2005-June/074001.html by Heather Turner
# https://stat.ethz.ch/pipermail/r-devel/2007-June/046217.html by Ben Bolker
# with some further modifications by Petr Savicky
sweep <- function(x, MARGIN, STATS, FUN = "-", check.margin=FALSE, ...)
{
FUN <- match.fun(FUN)
dims <- dim(x)
dimmargin <- dims[MARGIN]
if (is.null(dim(STATS))) {
dimstats <- length(STATS)
} else {
dimstats <- dim(STATS)
check.margin <- TRUE
}
s <- length(STATS)
if (s > prod(dimmargin)) {
warning("length of STATS greater than the extent of dim(x)[MARGIN]")
} else if (check.margin) {
dimmargin <- dimmargin[dimmargin > 1]
dimstats <- dimstats[dimstats > 1]
if (length(dimstats) > length(dimmargin) ||
any(dimstats != dimmargin[seq(along.with=dimstats)]))
warning("length(STATS) or dim(STATS) do not match dim(x)[MARGIN]")
} else {
cumDim <- c(1, cumprod(dimmargin))
upper <- min(cumDim[cumDim >= s])
lower <- max(cumDim[cumDim <= s])
if (upper %% s != 0 || s %% lower != 0)
warning("STATS does not recycle exactly across MARGIN")
}
perm <- c(MARGIN, (1:length(dims))[ - MARGIN])
FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...)
}
Heather presented four examples testing her code:
sweep(array(1:24, dim = c(4,3,2)), 1, 1:2) # no warning
sweep(array(1:24, dim = c(4,3,2)), 1, 1:12) # no warning
sweep(array(1:24, dim = c(4,3,2)), 1, 1:24) # no warning
sweep(array(1:24, dim = c(4,3,2)), 1:2, 1:3) # warning
The second and third example are not really correct, since STATS extends
also to dimensions not included in MARGIN. The problem is better visible
for example in
sweep(array(1:24, dim = c(4,4,3,3,2,2)), c(1,3), 1:12)
where MARGIN clearly has to contain two dimensions explicitly.
So, I use the examples with a larger margin corresponding to STATS
as follows
sweep(array(1:24, dim = c(4,3,2)), 1, 1:2) # no warning
sweep(array(1:24, dim = c(4,3,2)), 1:2, 1:12) # no warning
sweep(array(1:24, dim = c(4,3,2)), 1:3, 1:24) # no warning
sweep(array(1:24, dim = c(4,3,2)), 1:2, 1:3) # warning
The current proposal for sweep indeed gives no warning in the first
three examples and gives a warning in the last one.
I did not use the suggestion to call the option warn with default
warn = getOption("warn"). The reason is that there are two
different decisions:
(1) whether to generate a warning
(2) what to do with the warning, if it is generated.
The warn option influences (2): the warning may be suppressed,
printed after the return to the top level, printed immediately or
it may be converted to an error. I think that the option
controling (2) should not be mixed with an option which
controls (1). If R has an option controling to which extent
recycling is allowed, then this could be used, but warn
has a different purpose.
I appreciate feedback.
Petr.
More information about the R-devel
mailing list