[R] Using "survey" package with ACS PUMS
Michael.Laviolette at dhhs.state.nh.us
Michael.Laviolette at dhhs.state.nh.us
Tue Sep 30 15:17:28 CEST 2014
I'm trying to reproduce some results from the American Community Survey
PUMS data using the "survey" package. I'm using the one-year 2012 estimates
for New Hampshire
(http://www2.census.gov/acs2012_1yr/pums/csv_pnh.zip) and comparing to the
estimates for user verification from
http://www.census.gov/acs/www/Downloads/data_documentation/pums/Estimates/pums_estimates_12.csv
Once the age groups are set up as specified in the verification estimates,
the following SAS code produces the correct estimated totals with standard
errors:
proc surveyfreq data = acs2012 varmethod = jackknife;
weight pwgtp;
repweights pwgtp1 -- pwgtp80 / jkcoefs = 0.05;
table SEX agegroup;
run;
I've not been successful in reproducing the standard errors with R,
although they are very close. My code follows; what revisions do I need to
make?
Thanks,
Mike L.
# load estimates for verification
pums_est <- read.csv("pums_estimates_12.csv")
pums_est[,4] <- as.integer(gsub(",", "", pums_est[,4]))
# load PUMS data
pums_p <- read.csv("ss12pnh.csv")
# convert sex and age group to factors
pums_p$SEX <- factor(pums_p$SEX, labels = c("M","F"))
pums_p$agegrp <- cut(pums_p$AGEP,
c(0,5,10,15,20,25,35,45,55,60,65,75,85,100),
right = FALSE)
# create replicate-weight survey object
library(survey)
pums_p.rep <- svrepdesign(repweights = pums_p[207:286],
weights = pums_p[,7],
combined.weights = TRUE,
type = "Fay", rho = 1 - 1/sqrt(4),
scale = 1, rscales = 1,
data = pums_p)
# using type "JK1" with scale = 4/80 and rscales = rep(1,80)
# seems to produce the same results
# total population by sex with SE's
by.sex <- svyby(~SEX, ~ST, pums_p.rep, svytotal, na.rm = TRUE)
round(by.sex[1,4:5])
# se1 se2
# 33 1606 1606
# compare results with Census
pums_est[966:967, 5]
#[1] 1610 1610
# total population by age group with SE's
by.agegrp <- svyby(~agegrp, ~ST, pums_p.rep, svytotal, na.rm = TRUE)
round((by.agegrp)[15:27])
# se1 se2 se3 se4 se5 se6 se7 se8 se9 se10 se11 se12 se13
# 33 874 2571 2613 1463 1398 1475 1492 1552 2191 2200 880 1700 1678
# compare results with Census
pums_est[968:980, 5]
# [1] 874 2578 2613 1463 1399 1476 1493 1555 2191 2200 880 1702 1684
More information about the R-help
mailing list