[R] sweep() and recycling
Prof Brian Ripley
ripley at stats.ox.ac.uk
Wed Jun 22 12:12:06 CEST 2005
I think that as the proponents do not agree, we need to leave this as is.
BTW, R-devel is the place to discuss patched to R, rather than R-help.
On Tue, 21 Jun 2005, Heather Turner wrote:
> Agreed my examples may be trivial and I'm sure there are more efficient ways to do the same thing, but I disagree that my examples are against the spirit of sweep (). In the first example a vector is swept out from the rows, with one value for odd rows and one value for even rows. In the second example an array of values is swept out across the third dimension. In the third example an array of values is swept out from the full array.
>
> The first example is a natural use of recycling. E.g.
>
> sweep(matrix(1:100, 50, 2), 1, c(1, 1000), "+", give.warning = TRUE)
>
> is a quicker way of writing
>
> sweep(matrix(1:100, 50, 2), 1, rep(c(1, 1000), 25), "+", give.warning = TRUE)
>
> but your code would give a warning in the first case, even though the intent and the result are exactly the same as in the second case.
>
> As you say, it is only a warning, that can be ignored. However the warning should at least reflect the warning condition used, i.e. warn that the length of STATS does not equal the extent of MARGIN, rather warning that STATS does not recycle exactly.
>
> Heather
>
>>>> Robin Hankin <r.hankin at noc.soton.ac.uk> 06/21/05 02:47pm >>>
> Hi
>
> On Jun 21, 2005, at 02:33 pm, Heather Turner wrote:
>
>> I think the warning condition in Robin's patch is too harsh - the
>> following examples seem reasonable to me, but all produce warnings
>>
>> sweep(array(1:24, dim = c(4,3,2)), 1, 1:2, give.warning = TRUE)
>> sweep(array(1:24, dim = c(4,3,2)), 1, 1:12, give.warning = TRUE)
>> sweep(array(1:24, dim = c(4,3,2)), 1, 1:24, give.warning = TRUE)
>>
>
>
> The examples above do give warnings (as intended) but I think all three
> cases above
> are inimical to the spirit of sweep(): nothing is being "swept" out.
>
> So a warning is appropriate, IMO.
>
> In any case, one can always suppress (or ignore!) a warning if one knows
> what one is doing. YMMV, but if I wanted to do the above operations I
> would
> replace
>
>
> sweep(array(0, dim = c(4,3,2)), c(1,3), 1:12, "+" , give.warning =
> FALSE)
>
> with
>
> aperm(array(1:12,c(4,2,3)),c(1,3,2))
>
>
> best wishes
>
> rksh
>
>
>
>
>
>
>
>> I have written an alternative (given below) which does not give
>> warnings in the above cases, but does warn in the following case
>>
>>> sweep(array(1:24, dim = c(4,3,2)), 1:2, 1:3)
>> , , 1
>>
>> [,1] [,2] [,3]
>> [1,] 0 3 6
>> [2,] 0 3 9
>> [3,] 0 6 9
>> [4,] 3 6 9
>>
>> , , 2
>>
>> [,1] [,2] [,3]
>> [1,] 12 15 18
>> [2,] 12 15 21
>> [3,] 12 18 21
>> [4,] 15 18 21
>>
>> Warning message:
>> STATS does not recycle exactly across MARGIN
>>
>> The code could be easily modified to warn in other cases, e.g. when
>> length of STATS is a divisor of the corresponding array extent (as in
>> the first example above, with length(STATS) = 2).
>>
>> The code also includes Gabor's suggestion.
>>
>> Heather
>>
>> sweep <- function (x, MARGIN, STATS, FUN = "-", warn =
>> getOption("warn"), ...)
>> {
>> FUN <- match.fun(FUN)
>> dims <- dim(x)
>> perm <- c(MARGIN, (1:length(dims))[-MARGIN])
>> if (warn >= 0) {
>> s <- length(STATS)
>> cumDim <- c(1, cumprod(dims[perm]))
>> if (s > max(cumDim))
>> warning("length of STATS greater than length of array",
>> call. = FALSE)
>> else {
>> upper <- min(ifelse(cumDim > s, cumDim, max(cumDim)))
>> lower <- max(ifelse(cumDim < s, cumDim, min(cumDim)))
>> if (any(upper %% s != 0, s %% lower != 0))
>> warning("STATS does not recycle exactly across MARGIN",
>> call. = FALSE)
>> }
>> }
>> FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...)
>> }
>>
>>>>> Gabor Grothendieck <ggrothendieck at gmail.com> 06/21/05 01:25pm >>>
>> \
>> Perhaps the signature should be:
>>
>> sweep(...other args go here..., warn=getOption("warn"))
>>
>> so that the name and value of the argument are consistent with
>> the R warn option.
>>
>> On 6/21/05, Robin Hankin <r.hankin at noc.soton.ac.uk> wrote:
>>>
>>> On Jun 20, 2005, at 04:58 pm, Prof Brian Ripley wrote:
>>>
>>>> The issue here is that the equivalent command array(1:5, c(6,6)) (to
>>>> matrix(1:5,6,6)) gives no warning, and sweep uses array().
>>>>
>>>> I am not sure either should: fractional recycling was normally
>>>> allowed
>>>> in S3 (S4 tightened up a bit).
>>>>
>>>> Perhaps someone who thinks sweep() should warn could contribute a
>>>> tested patch?
>>>>
>>>
>>>
>>> OK, modified R code and Rd file below (is this the best way to do
>>> this?)
>>>
>>>
>>>
>>>
>>> "sweep" <-
>>> function (x, MARGIN, STATS, FUN = "-", give.warning = FALSE, ...)
>>> {
>>> FUN <- match.fun(FUN)
>>> dims <- dim(x)
>>> if(give.warning & length(STATS)>1 & any(dims[MARGIN] !=
>>> dim(as.array(STATS)))){
>>> warning("array extents do not recycle exactly")
>>> }
>>> perm <- c(MARGIN, (1:length(dims))[-MARGIN])
>>> FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...)
>>> }
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> \name{sweep}
>>> \alias{sweep}
>>> \title{Sweep out Array Summaries}
>>> \description{
>>> Return an array obtained from an input array by sweeping out a
>>> summary
>>> statistic.
>>> }
>>> \usage{
>>> sweep(x, MARGIN, STATS, FUN="-", give.warning = FALSE, \dots)
>>> }
>>> \arguments{
>>> \item{x}{an array.}
>>> \item{MARGIN}{a vector of indices giving the extents of \code{x}
>>> which correspond to \code{STATS}.}
>>> \item{STATS}{the summary statistic which is to be swept out.}
>>> \item{FUN}{the function to be used to carry out the sweep. In the
>>> case of binary operators such as \code{"/"} etc., the function
>>> name
>>> must be quoted.}
>>> \item{give.warning}{Boolean, with default \code{FALSE} meaning to
>>> give no warning, even if array extents do not match. If
>>> \code{TRUE}, check for the correct dimensions and if a
>>> mismatch is detected, give a suitable warning.}
>>> \item{\dots}{optional arguments to \code{FUN}.}
>>> }
>>> \value{
>>> An array with the same shape as \code{x}, but with the summary
>>> statistics swept out.
>>> }
>>> \note{
>>> If \code{STATS} is of length 1, recycling is carried out with no
>>> warning irrespective of the value of \code{give.warning}.
>>> }
>>>
>>> \references{
>>> Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
>>> \emph{The New S Language}.
>>> Wadsworth \& Brooks/Cole.
>>> }
>>> \seealso{
>>> \code{\link{apply}} on which \code{sweep} used to be based;
>>> \code{\link{scale}} for centering and scaling.
>>> }
>>> \examples{
>>> require(stats) # for median
>>> med.att <- apply(attitude, 2, median)
>>> sweep(data.matrix(attitude), 2, med.att)# subtract the column medians
>>>
>>> a <- array(0, c(2, 3, 4))
>>> b <- matrix(1:8, c(2, 4))
>>> sweep(a, c(1, 3), b, "+", give.warning = TRUE) # no warning:
>>> all(dim(a)[c(1,3)] == dim(b))
>>> sweep(a, c(1, 2), b, "+", give.warning = TRUE) # warning given
>>>
>>> }
>>> \keyword{array}
>>> \keyword{iteration}
>>>
>>>
>>>
>>>
>>> --
>>> Robin Hankin
>>> Uncertainty Analyst
>>> National Oceanography Centre, Southampton
>>> European Way, Southampton SO14 3ZH, UK
>>> tel 023-8059-7743
>>>
>>> ______________________________________________
>>> R-help at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide!
>>> http://www.R-project.org/posting-guide.html
>>>
>>
>>
> --
> Robin Hankin
> Uncertainty Analyst
> National Oceanography Centre, Southampton
> European Way, Southampton SO14 3ZH, UK
> tel 023-8059-7743
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list