[Rd] Performance issue in stats:::weighted.mean.default method
Tadeáš Palusga
tadeas at palusga.cz
Thu Mar 5 15:55:15 CET 2015
Hi,
I'm using this mailing list for the first time and I hope this is the
right one. I don't think that the following is a bug but it can be a
performance issue.
By my opinion, there is no need to filter by [w != 0] in last sum of
weighted.mean.default method defined in
src/library/stats/R/weighted.mean.R. There is no need to do it because
you can always sum zero numbers and filtering is too expensive (see
following benchmark snippet)
library(microbenchmark)
x <- sample(500,5000,replace=TRUE)
w <- sample(1000,5000,replace=TRUE)/1000 *
ifelse((sample(10,5000,replace=TRUE) -1) > 0, 1, 0)
fun.new <- function(x,w) {sum(x*w)/sum(w)}
fun.orig <- function(x,w) {sum(x*w[w!=0])/sum(w)}
print(microbenchmark(
ORIGFN = fun.orig(x,w),
NEWFN = fun.new(x,w),
times = 1000))
#results:
#Unit: microseconds
# expr min lq mean median uq max neval
# ORIGFN 190.889 194.6590 210.08952 198.847 202.928 1779.789 1000
# NEWFN 20.857 21.7175 24.61149 22.080 22.594 1744.014 1000
So my suggestion is to remove the w != check
Index: weighted.mean.R
===================================================================
--- weighted.mean.R (revision 67941)
+++ weighted.mean.R (working copy)
@@ -29,7 +29,7 @@
stop("'x' and 'w' must have the same length")
w <- as.double(w) # avoid overflow in sum for integer weights.
if (na.rm) { i <- !is.na(x); w <- w[i]; x <- x[i] }
- sum((x*w)[w != 0])/sum(w) # --> NaN in empty case
+ sum(x*w)/sum(w) # --> NaN in empty case
}
## see note for ?mean.Date
I hope i'm not missing something - I really don't see the reason to have
this filtration here.
BR
Tadeas 'donarus' Palusga
More information about the R-devel
mailing list