[Rd] tapply with weighted.mean
Martyn Plummer
plummer at iarc.fr
Wed Jan 26 16:40:43 CET 2005
We were caught out recently attempting to use tapply to get a table of
weighted means. This gives the wrong answer (or, more correctly, not
the answer we were expecting), as the following example shows:
R> x <- 1:10 #some data
R> w <- c(1:5,5:1) #weights
R> id <- rep(1:2,rep(5,2)) #id values
R> weighted.mean(x[id==1],w[id==1]) #Weighted mean of x in group 1
[1] 3.666667
R> weighted.mean(x[id==2],w[id==2]) #Weighted mean of x in group 2
[1] 7.333333
R> tapply(x,INDEX=id,FUN=weighted.mean,w=w) #Wrong!
1 2
3 8
The reason for this is that tapply splits it's first argument by the
INDEX variable, but does not split any of the arguments supplied via ...
So the result is
c(weighted.mean(x[id==1],w), weighted.mean(x[id==2],w))
R silently replicates the shorter variable to match the length of the
longer one.
I draw two conclusions from this:
1) weighted.mean(x,w) should include a length check for w. The
documentation says it should be the same length as x, so this should be
enforced.
2) More importantly, the help page for tapply should explicitly warn the
user that optional arguments supplied to 'FUN' are not split by 'INDEX'.
I really only understood the behaviour of tapply after inspecting the
code. Then it became obvious why this could never work.
I hope I am not being too obtuse. Any objections before I make these
changes?
Martyn
More information about the R-devel
mailing list