[R] weighted mean
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Wed Nov 26 14:26:57 CET 2003
Jason Turner <jasont at indigoindustrial.co.nz> writes:
> MZodet at ahrq.gov wrote:
>
> > How do I go about generating a WEIGHTED mean (and standard error) of a
> > variable (e.g., expenditures) for each level of a categorical variable
> > (e.g., geographic region)? I'm looking for something comparable to PROC
> > MEANS in SAS with both a class and weight statement.
>
> That's two questions.
> 1) to apply a weighted mean to a vector, see ?weighted.mean
> 2) to apply a function to data grouped by categorical variable, you
> probably need "by" or "tapply". See the help pages and examples for
> both.
Three actually. Noone seems to have answered how to get the SD, and
that's a little more tricky.
The simplest (well, the quickest) way to get the weighted SD is to do
a weighted regression analysis with just an intercept term:
x <- c(3,4,5); w <- c(2,5,7) # just for testing
summary(lm(x~1,weight=w))$sigma
# this is the weighted sum of squares on N-1 DF
wss <- sum((x-m)^2*w)
sqrt(wss/2)
Notice however that SAS also does frequency weighting where
(x=2.7,w=5) means that there are five observations of 2.7.
In that case, the brute-force approach is
sd(rep(x,w))
# which is the same as
sqrt(wss/13) # sum(w)-1 DF
--
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list