[R] weighted mean

Peter Dalgaard p.dalgaard at biostat.ku.dk
Wed Nov 26 14:26:57 CET 2003


Jason Turner <jasont at indigoindustrial.co.nz> writes:

> MZodet at ahrq.gov wrote:
> 
> > How do I go about generating a WEIGHTED mean (and standard error) of a
> > variable (e.g., expenditures) for each level of a categorical variable
> > (e.g., geographic region)?  I'm looking for something comparable to PROC
> > MEANS in SAS with both a class and weight statement.
> 
> That's two questions.
> 1) to apply a weighted mean to a vector, see ?weighted.mean
 
> 2) to apply a function to data grouped by categorical variable, you
> probably need "by" or "tapply".  See the help pages and examples for
> both.

Three actually. Noone seems to have answered how to get the SD, and
that's a little more tricky.  

The simplest (well, the quickest) way to get the weighted SD is to do
a weighted regression analysis with just an intercept term:

x <- c(3,4,5); w <- c(2,5,7) # just for testing
summary(lm(x~1,weight=w))$sigma

# this is the weighted sum of squares on N-1 DF

wss <- sum((x-m)^2*w)
sqrt(wss/2)


Notice however that SAS also does frequency weighting where
(x=2.7,w=5) means that there are five observations of 2.7. 

In that case, the brute-force approach is 


sd(rep(x,w))

# which is the same as

sqrt(wss/13) # sum(w)-1 DF

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907




More information about the R-help mailing list