[R] Direct Method Age-Adjustment to Complex Survey Data
Thomas Lumley
tlumley at uw.edu
Mon Aug 13 06:27:07 CEST 2012
On Sat, Aug 11, 2012 at 5:53 AM, Anthony Damico <ajdamico at gmail.com> wrote:
> Hi everyone, my apologies in advance if I'm overlooking something simple in
> this question. I am trying to use R's survey package to make a direct
> method age-adjustment to some complex survey data. I have played with
> postStratify, calibrate, rake, and simply multiplying the base weights by
> the correct proportions - nothing seems to hit the published numbers on the
> nose.
<snip>
> # but matching the figure exactly requires an exact age adjustment.
>
> # create the population types vector
> pop.types <-
> data.frame(
> agecat = 0:3 ,
> Freq = c( 55901 , 77670 , 72816 , 45364 )
> )
>
>
> z.postStratified <- postStratify( z , ~agecat , pop.types , partial = T )
The standardization in the CDC examples is within each subpopulation.
That is, they standardise each race/ethnicity group to the Census age
structure, rather than standardising the whole population. That's the
whole point -- they want to look at an imaginary population where age
and race aren't confounded.
When I do this, it almost exactly matches. The next step was to drop
all the missing data and reweight just the non-missing data. That
works exactly. (I also think you have the wrong recoding of RIDRETH1).
demog<-read.xport("~/Downloads/demo_f.xpt")
chol<-read.xport("~/Downloads/TCHOL_f.xpt")
alldata<-merge(demog,chol)
alldata<-subset(alldata, RIDSTATR %in% 2)
alldata<-transform(alldata, HI_CHOL = ifelse(LBXTC>=240,1,0))
alldata<-transform(alldata, race=c(1,1,2,3,4)[RIDRETH1])
alldata<-transform(alldata, agecat=cut(RIDAGEYR,c(0,19,39,59, Inf)))
popage<-c(55901,77670,72816,45364)
racegender<-as.data.frame(svytable(~race+RIAGENDR,design))
racegenderage<-expand.grid(race=1:4,RIAGENDR=1:2,agecat=levels(alldata$agecat))
racegenderage$Freq<- as.vector(outer(racegender$Freq, popage/sum(popage)))
design <- svydesign(id=~SDMVPSU,
strata=~SDMVSTRA,nest=TRUE,weights=~WTMEC2YR,data=alldata)
svyby(~HI_CHOL,~race+RIAGENDR,design=subset(postStratify(design,~race+RIAGENDR+agecat,racegenderage),RIDAGEYR>=20),svymean,na.rm=TRUE)
somedata<-subset(alldata, !is.na(LBXTC))
design1 <- svydesign(id=~SDMVPSU,
strata=~SDMVSTRA,nest=TRUE,weights=~WTMEC2YR,data=somedata)
svyby(~HI_CHOL,~race+RIAGENDR,design=subset(postStratify(design1,~race+RIAGENDR+agecat,racegenderage),RIDAGEYR>=20),svymean,na.rm=TRUE)
-thomas
--
Thomas Lumley
Professor of Biostatistics
University of Auckland
More information about the R-help
mailing list