[R] Comparison of means in survey package

Thu Aug 18 23:19:21 CEST 2011

On Fri, Aug 19, 2011 at 7:25 AM, Simon Kiss <sjkiss at gmail.com> wrote:
> Dear list colleagues,
> I'm trying to come up with a test question for undergraduates to illustrate comparison of means from a complex survey design. The data for the example looks roughly like this:
>
> mytest<-data.frame(harper=rnorm(500, mean=60, sd=1), party=sample(c("BQ", "NDP", "Conservative", "Liberal", "None", NA), size=500, replace=TRUE), natwgt=sample(c(0.88, 0.99, 1.43, 1.22, 1.1), size=500, replace=TRUE), gender=sample(c("Male", "Female"), size=500, replace=TRUE))
>
> Using svyby I can get the means for each group of interest (primarily the party variable), but I can't get further to actually do the comparison of means.  I saw a reference on the help listserv to the effect that the survey package does not do ttests and that one should use svyglm.  However, that was in 2009 and I see that there's a command, svytteset in the package which seems to be on point.  However, when I've tried that command I can't get it to work: it returns the following error message:
>
> t = NaN, df = 3255, p-value = NA
> alternative hypothesis: true difference in mean is not equal to 0
> sample estimates:
> difference in mean
>          38.80387

It would have been helpful to give your code.

mytest<-data.frame(harper=rnorm(500, mean=60, sd=1),
party=sample(c("BQ", "NDP", "Conservative", "Liberal", "None", NA),
size=500, replace=TRUE), natwgt=sample(c(0.88, 0.99, 1.43, 1.22, 1.1),
size=500, replace=TRUE), gender=sample(c("Male", "Female"), size=500,
replace=TRUE))

des<-svydesign(id=~1, weight=~natwgt,data=mytest)

> svyttest(harper~gender, design=des)

	Design-based t-test

data:  harper ~ gender
t = 0.7678, df = 498, p-value = 0.443
alternative hypothesis: true difference in mean is not equal to 0
sample estimates:
difference in mean
        0.06621512

You can compare one party to all others:

> svyttest(harper~I(party=="Liberal"), design=des)

	Design-based t-test

data:  harper ~ I(party == "Liberal")
t = -1.3194, df = 498, p-value = 0.1876
alternative hypothesis: true difference in mean is not equal to 0
sample estimates:
difference in mean
        -0.1432304

Or subset down to just two parties to compare:

> svyttest(harper~I(party=="Liberal"), design=subset(des,party %in% c("Liberal","Conservative")))

	Design-based t-test

data:  harper ~ I(party == "Liberal")
t = -0.5622, df = 181, p-value = 0.5747
alternative hypothesis: true difference in mean is not equal to 0
sample estimates:
difference in mean
       -0.08031931

Or convert the design to use replicate weights, so you then can
compute contrasts after svyby()

> rdes<-as.svrepdesign(des)
> support<-svyby(~harper,~party,svymean,design=rdes,covmat=TRUE)
> support
                    party   harper         se
BQ                     BQ 59.89231 0.10544441
Conservative Conservative 59.89364 0.10824393
Liberal           Liberal 59.81332 0.09534826
NDP                   NDP 59.98576 0.11232908
None                 None 60.07644 0.10327210
> svycontrast(support, quote(BQ-Conservative))
              nlcon     SE
contrast -0.0013296 0.1511
> svycontrast(support, quote(Conservative-Liberal))
            nlcon     SE
contrast 0.080319 0.1442

Though if I were doing this analysis I'd fit a linear regression model.

   -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland