[Rd] sweep sanity checking?
Martin Maechler
maechler at stat.math.ethz.ch
Mon Sep 3 08:44:40 CEST 2007
I've been asked to close this thread "in public"
>>>>> "TH" == Turner, Heather <Heather.Turner at warwick.ac.uk>
>>>>> on Wed, 22 Aug 2007 10:20:21 +0100 writes:
TH> Petr Savicky kindly brought this thread to my attention
TH> as I'm afraid it had passed me by. As one of the
TH> contributors to the earlier discussion on adding
TH> warnings to sweep I would like to give my support to
TH> Petr's proposed patch.
Petr's proposed patch has been added to R-devel (to become
R-2.6.0 alpha in two days) last week.
Actually it was found subsequently that there was a small
problem in the patch [basically because n %% n is zero only
for n != 0 and the case n==0 is valid].
If you are interested in using the new sweep() version, and because
we are entering alpha testing of R 2.6.0,
I'd recommend you to start using "R-devel".
Martin Maechler
TH> For the record I should say that Petr was right to point
TH> out that the use of MARGIN in my examples did not make
TH> sense
TH> https://stat.ethz.ch/pipermail/r-devel/2007-July/046487.html
TH> so I have no quibble with that.
TH> I think it is sensible too, to use the dim attribute of
TH> STATS as the basis of the test, when the dim attribute
TH> is present. This provides a way to control the strength
TH> of the test in the case of sweeping out a vector, as
TH> Petr describes in his message below. I think that the
TH> proposed patch successfully brings together the
TH> different views on what should be tested, which was the
TH> stumbling block last time around
TH> https://stat.ethz.ch/pipermail/r-help/2005-June/074037.html
TH> Even if people set check.margin = FALSE for reasons of
TH> speed, this in itself should be a useful check, since
TH> they will need to be confident that the test is
TH> unnecessary.
TH> Heather
TH> -----Original Message----- From:
TH> r-devel-bounces at r-project.org
TH> [mailto:r-devel-bounces at r-project.org] On Behalf Of Petr
TH> Savicky Sent: 08 August 2007 07:54 To:
TH> r-devel at r-project.org Subject: Re: [Rd] sweep sanity
TH> checking?
TH> Thanks to Martin Maechler for his comments, advice and
TH> for pointing out the speed problem. Thanks also to Ben
TH> Bolker for tests of speed, which confirm that for small
TH> arrays, a slow down by a factor of about 1.2 - 1.5 may
TH> occur. Now, I would like to present a new version of
TH> sweep, which is simpler and has an option to avoid the
TH> test. This is expected to be used in scripts, where the
TH> programmer is quite sure that the usage is correct and
TH> speed is required. The new version differs from the
TH> previous one in the following:
TH> 1. The option check.margin has a different meaning. It
TH> defaults to TRUE and it determines whether the test is
TH> performed or not.
TH> 2. Since check.margin has the meaning above, it cannot
TH> be used to select, which test should be performed. This
TH> depends on the type of STATS. The suggested sweep
TH> function contains two tests: - a vector test by Heather
TH> Turner, which is used, if STATS has no dim attribute
TH> and, hence, is a vector (STATS should not be anything
TH> else than a vector or an array) - an array test used if
TH> STATS has dim attribute. The vector test allows some
TH> kinds of recycling, while the array test does
TH> not. Hence, in the most common case, where x is a matrix
TH> and STATS is a vector, if the user likes to be warned if
TH> the length of the vector is not exactly the right one,
TH> the following call is suggested:
TH> sweep(x,MARGIN,as.array(STATS)). Otherwise, a warning
TH> will be generated only if length(STATS) does not divide
TH> the specified dimension of x, which is nrow(x)
TH> (MARGIN=1) or ncol(x) (MARGIN=2).
TH> 3. If STATS is an array, then the test is more
TH> restrictive than in the previous version. It is now
TH> required that after deleting dimensions with one level,
TH> the remaining dimensions coincide. The previous version
TH> allowed additionally the cases, when dim(STATS) is a
TH> prefix of dim(x)[MARGIN], for example, if dim(STATS) =
TH> k1 and dim(x)[MARGIN] = c(k1,k2).
TH> The code of the tests in the suggested sweep is based on
TH> the previous suggestions
TH> https://stat.ethz.ch/pipermail/r-help/2005-June/073989.html
TH> by Robin Hankin
TH> https://stat.ethz.ch/pipermail/r-help/2005-June/074001.html
TH> by Heather Turner
TH> https://stat.ethz.ch/pipermail/r-devel/2007-June/046217.html
TH> by Ben Bolker with some further modifications.
TH> The modification of sweep.Rd was prepared by Ben Bolker
TH> and me.
TH> I would like to encourage everybody who likes to express
TH> his opinion on the patch to do it now. In my opinion,
TH> the suggestion of the new code stabilized in the sense
TH> that I will not modify it unless there is a negative
TH> feedback.
TH> A patch against R-devel_2007-08-06 is attached. It
TH> contains tabs. If they are corrupted by email transfer,
TH> use the link
TH> http://www.cs.cas.cz/~savicky/R-devel/patch-sweep which
TH> is an identical copy.
TH> Petr Savicky.
TH> ========================================================================
TH> ======================== ---
TH> R-devel_2007-08-06/src/library/base/R/sweep.R 2007-07-27
TH> 17:51:13.000000000 +0200 +++
TH> R-devel_2007-08-06-sweep/src/library/base/R/sweep.R
TH> 2007-08-07 10:30:12.383672960 +0200 @@ -14,10 +14,29 @@
TH> # A copy of the GNU General Public License is available
TH> at # http://www.r-project.org/Licenses/
TH> -sweep <- function(x, MARGIN, STATS, FUN = "-", ...)
TH> +sweep <- function(x, MARGIN, STATS, FUN = "-",
TH> check.margin=TRUE, ...) { FUN <- match.fun(FUN) dims <-
TH> dim(x) + if (check.margin) { + dimmargin <- dims[MARGIN]
TH> + dimstats <- dim(STATS) + lstats <- length(STATS) + if
TH> (lstats > prod(dimmargin)) { + warning("length of STATS
TH> greater than the extent of dim(x)[MARGIN]") + } else if
TH> (is.null(dimstats)) { # STATS is a vector + cumDim <-
TH> c(1, cumprod(dimmargin)) + upper <- min(cumDim[cumDim >=
TH> lstats]) + lower <- max(cumDim[cumDim <= lstats]) + if
TH> (upper %% lstats != 0 || lstats %% lower != 0) +
TH> warning("STATS does not recycle exactly across MARGIN")
TH> + } else { + dimmargin <- dimmargin[dimmargin > 1] +
TH> dimstats <- dimstats[dimstats > 1] + if
TH> (length(dimstats) != length(dimmargin) || any(dimstats
TH> != dimmargin)) + warning("length(STATS) or dim(STATS) do
TH> not match dim(x)[MARGIN]") + } + } perm <- c(MARGIN,
TH> (1:length(dims))[ - MARGIN]) FUN(x, aperm(array(STATS,
TH> dims[perm]), order(perm)), ...) } ---
TH> R-devel_2007-08-06/src/library/base/man/sweep.Rd
TH> 2007-07-27 17:51:35.000000000 +0200 +++
TH> R-devel_2007-08-06-sweep/src/library/base/man/sweep.Rd
TH> 2007-08-07 10:29:45.517757200 +0200 @@ -11,7 +11,7 @@
TH> statistic. } \usage{ -sweep(x, MARGIN, STATS, FUN="-",
TH> \dots) +sweep(x, MARGIN, STATS, FUN="-",
TH> check.margin=TRUE, \dots) } \arguments{ \item{x}{an
TH> array.} @@ -22,8 +22,18 @@ case of binary operators
TH> such as \code{"/"} etc., the function name must
TH> backquoted or quoted. (\code{FUN} is found by a call to
TH> \code{\link{match.fun}}.)} +
TH> \item{check.margin}{logical. If \code{TRUE} (the
TH> default), warn if the + length or dimensions of
TH> \code{STATS} do + not match the specified dimensions of
TH> \code{x}.} \item{\dots}{optional arguments to
TH> \code{FUN}.} } +\details{ + The consistency check among
TH> \code{STATS}, \code{MARGIN} and \code{x} + is stricter
TH> if \code{STATS} is an array than if it is a vector. +
TH> In the vector case, some kinds of recycling are allowed
TH> without a + warning. Use
TH> \code{sweep(x,MARGIN,as.array(STATS))} if \code{STATS} +
TH> is a vector and you want to be warned if any recycling
TH> occurs. +} \value{ An array with the same shape as
TH> \code{x}, but with the summary statistics swept out.
TH> ______________________________________________
TH> R-devel at r-project.org mailing list
TH> https://stat.ethz.ch/mailman/listinfo/r-devel
TH> ______________________________________________
TH> R-devel at r-project.org mailing list
TH> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list