[R] Comparison of means in survey package
Thomas Lumley
tlumley at uw.edu
Thu Aug 18 23:19:21 CEST 2011
On Fri, Aug 19, 2011 at 7:25 AM, Simon Kiss <sjkiss at gmail.com> wrote:
> Dear list colleagues,
> I'm trying to come up with a test question for undergraduates to illustrate comparison of means from a complex survey design. The data for the example looks roughly like this:
>
> mytest<-data.frame(harper=rnorm(500, mean=60, sd=1), party=sample(c("BQ", "NDP", "Conservative", "Liberal", "None", NA), size=500, replace=TRUE), natwgt=sample(c(0.88, 0.99, 1.43, 1.22, 1.1), size=500, replace=TRUE), gender=sample(c("Male", "Female"), size=500, replace=TRUE))
>
> Using svyby I can get the means for each group of interest (primarily the party variable), but I can't get further to actually do the comparison of means. I saw a reference on the help listserv to the effect that the survey package does not do ttests and that one should use svyglm. However, that was in 2009 and I see that there's a command, svytteset in the package which seems to be on point. However, when I've tried that command I can't get it to work: it returns the following error message:
>
> t = NaN, df = 3255, p-value = NA
> alternative hypothesis: true difference in mean is not equal to 0
> sample estimates:
> difference in mean
> 38.80387
It would have been helpful to give your code.
mytest<-data.frame(harper=rnorm(500, mean=60, sd=1),
party=sample(c("BQ", "NDP", "Conservative", "Liberal", "None", NA),
size=500, replace=TRUE), natwgt=sample(c(0.88, 0.99, 1.43, 1.22, 1.1),
size=500, replace=TRUE), gender=sample(c("Male", "Female"), size=500,
replace=TRUE))
des<-svydesign(id=~1, weight=~natwgt,data=mytest)
> svyttest(harper~gender, design=des)
Design-based t-test
data: harper ~ gender
t = 0.7678, df = 498, p-value = 0.443
alternative hypothesis: true difference in mean is not equal to 0
sample estimates:
difference in mean
0.06621512
You can compare one party to all others:
> svyttest(harper~I(party=="Liberal"), design=des)
Design-based t-test
data: harper ~ I(party == "Liberal")
t = -1.3194, df = 498, p-value = 0.1876
alternative hypothesis: true difference in mean is not equal to 0
sample estimates:
difference in mean
-0.1432304
Or subset down to just two parties to compare:
> svyttest(harper~I(party=="Liberal"), design=subset(des,party %in% c("Liberal","Conservative")))
Design-based t-test
data: harper ~ I(party == "Liberal")
t = -0.5622, df = 181, p-value = 0.5747
alternative hypothesis: true difference in mean is not equal to 0
sample estimates:
difference in mean
-0.08031931
Or convert the design to use replicate weights, so you then can
compute contrasts after svyby()
> rdes<-as.svrepdesign(des)
> support<-svyby(~harper,~party,svymean,design=rdes,covmat=TRUE)
> support
party harper se
BQ BQ 59.89231 0.10544441
Conservative Conservative 59.89364 0.10824393
Liberal Liberal 59.81332 0.09534826
NDP NDP 59.98576 0.11232908
None None 60.07644 0.10327210
> svycontrast(support, quote(BQ-Conservative))
nlcon SE
contrast -0.0013296 0.1511
> svycontrast(support, quote(Conservative-Liberal))
nlcon SE
contrast 0.080319 0.1442
Though if I were doing this analysis I'd fit a linear regression model.
-thomas
--
Thomas Lumley
Professor of Biostatistics
University of Auckland
More information about the R-help
mailing list