[R] Using "survey" package with ACS PUMS
Anthony Damico
ajdamico at gmail.com
Tue Sep 30 15:29:42 CEST 2014
hi michael, you probably need
options( "survey.replicates.mse" = TRUE )
i also think you don't want type = "Fay" and you do want scale = 4/80 and
rscales = rep( 1 , 80 ) as well, but what you have might be equivalent
(not sure)
regardless, this blog post details how to precisely replicate the census
bureau's estimates using the acs pums with R
http://www.asdfree.com/search/label/american%20community%20survey%20%28acs%29
On Tue, Sep 30, 2014 at 9:17 AM, <Michael.Laviolette at dhhs.state.nh.us>
wrote:
>
> I'm trying to reproduce some results from the American Community Survey
> PUMS data using the "survey" package. I'm using the one-year 2012 estimates
> for New Hampshire
> (http://www2.census.gov/acs2012_1yr/pums/csv_pnh.zip) and comparing to the
> estimates for user verification from
>
> http://www.census.gov/acs/www/Downloads/data_documentation/pums/Estimates/pums_estimates_12.csv
>
> Once the age groups are set up as specified in the verification estimates,
> the following SAS code produces the correct estimated totals with standard
> errors:
>
> proc surveyfreq data = acs2012 varmethod = jackknife;
> weight pwgtp;
> repweights pwgtp1 -- pwgtp80 / jkcoefs = 0.05;
> table SEX agegroup;
> run;
>
> I've not been successful in reproducing the standard errors with R,
> although they are very close. My code follows; what revisions do I need to
> make?
>
> Thanks,
> Mike L.
>
> # load estimates for verification
> pums_est <- read.csv("pums_estimates_12.csv")
> pums_est[,4] <- as.integer(gsub(",", "", pums_est[,4]))
>
> # load PUMS data
> pums_p <- read.csv("ss12pnh.csv")
> # convert sex and age group to factors
> pums_p$SEX <- factor(pums_p$SEX, labels = c("M","F"))
> pums_p$agegrp <- cut(pums_p$AGEP,
> c(0,5,10,15,20,25,35,45,55,60,65,75,85,100),
> right = FALSE)
>
> # create replicate-weight survey object
> library(survey)
> pums_p.rep <- svrepdesign(repweights = pums_p[207:286],
> weights = pums_p[,7],
> combined.weights = TRUE,
> type = "Fay", rho = 1 - 1/sqrt(4),
> scale = 1, rscales = 1,
> data = pums_p)
>
> # using type "JK1" with scale = 4/80 and rscales = rep(1,80)
> # seems to produce the same results
>
> # total population by sex with SE's
> by.sex <- svyby(~SEX, ~ST, pums_p.rep, svytotal, na.rm = TRUE)
> round(by.sex[1,4:5])
> # se1 se2
> # 33 1606 1606
> # compare results with Census
> pums_est[966:967, 5]
> #[1] 1610 1610
>
> # total population by age group with SE's
> by.agegrp <- svyby(~agegrp, ~ST, pums_p.rep, svytotal, na.rm = TRUE)
> round((by.agegrp)[15:27])
> # se1 se2 se3 se4 se5 se6 se7 se8 se9 se10 se11 se12 se13
> # 33 874 2571 2613 1463 1398 1475 1492 1552 2191 2200 880 1700 1678
> # compare results with Census
> pums_est[968:980, 5]
> # [1] 874 2578 2613 1463 1399 1476 1493 1555 2191 2200 880 1702 1684
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list