[Rd] Performance issue in stats:::weighted.mean.default method
Prof Brian Ripley
ripley at stats.ox.ac.uk
Thu Mar 5 18:49:59 CET 2015
On 05/03/2015 14:55, Tadeáš Palusga wrote:
> Hi,
> I'm using this mailing list for the first time and I hope this is the
> right one. I don't think that the following is a bug but it can be a
> performance issue.
>
> By my opinion, there is no need to filter by [w != 0] in last sum of
> weighted.mean.default method defined in
> src/library/stats/R/weighted.mean.R. There is no need to do it because
> you can always sum zero numbers and filtering is too expensive (see
> following benchmark snippet)
But 0*x is not necessarily 0, so there is a need to do it ... see
> w <- c(0, 1)
> x <- c(Inf, 1)
> weighted.mean(x, w)
[1] 1
> fun.new(x, w)
[1] NaN
>
>
>
> library(microbenchmark)
> x <- sample(500,5000,replace=TRUE)
> w <- sample(1000,5000,replace=TRUE)/1000 *
> ifelse((sample(10,5000,replace=TRUE) -1) > 0, 1, 0)
> fun.new <- function(x,w) {sum(x*w)/sum(w)}
> fun.orig <- function(x,w) {sum(x*w[w!=0])/sum(w)}
> print(microbenchmark(
> ORIGFN = fun.orig(x,w),
> NEWFN = fun.new(x,w),
> times = 1000))
>
> #results:
> #Unit: microseconds
> # expr min lq mean median uq max neval
> # ORIGFN 190.889 194.6590 210.08952 198.847 202.928 1779.789 1000
> # NEWFN 20.857 21.7175 24.61149 22.080 22.594 1744.014 1000
>
>
>
>
> So my suggestion is to remove the w != check
>
>
>
>
> Index: weighted.mean.R
> ===================================================================
> --- weighted.mean.R (revision 67941)
> +++ weighted.mean.R (working copy)
> @@ -29,7 +29,7 @@
> stop("'x' and 'w' must have the same length")
> w <- as.double(w) # avoid overflow in sum for integer weights.
> if (na.rm) { i <- !is.na(x); w <- w[i]; x <- x[i] }
> - sum((x*w)[w != 0])/sum(w) # --> NaN in empty case
> + sum(x*w)/sum(w) # --> NaN in empty case
> }
>
> ## see note for ?mean.Date
>
>
> I hope i'm not missing something - I really don't see the reason to have
> this filtration here.
>
> BR
>
> Tadeas 'donarus' Palusga
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford
1 South Parks Road, Oxford OX1 3TG, UK
More information about the R-devel
mailing list