[R] Using "survey" package with ACS PUMS

Anthony Damico ajdamico at gmail.com
Tue Sep 30 15:29:42 CEST 2014


hi michael, you probably need

options( "survey.replicates.mse" = TRUE )


i also think you don't want type = "Fay" and you do want scale = 4/80 and
rscales = rep( 1 , 80 )  as well, but what you have might be equivalent
(not sure)


regardless, this blog post details how to precisely replicate the census
bureau's estimates using the acs pums with R

http://www.asdfree.com/search/label/american%20community%20survey%20%28acs%29









On Tue, Sep 30, 2014 at 9:17 AM, <Michael.Laviolette at dhhs.state.nh.us>
wrote:

>
> I'm trying to reproduce some results from the American Community Survey
> PUMS data using the "survey" package. I'm using the one-year 2012 estimates
> for New Hampshire
> (http://www2.census.gov/acs2012_1yr/pums/csv_pnh.zip) and comparing to the
> estimates for user verification from
>
> http://www.census.gov/acs/www/Downloads/data_documentation/pums/Estimates/pums_estimates_12.csv
>
> Once the age groups are set up as specified in the verification estimates,
> the following SAS code produces the correct estimated totals with standard
> errors:
>
> proc surveyfreq data = acs2012 varmethod = jackknife;
>   weight pwgtp;
>   repweights pwgtp1 -- pwgtp80 / jkcoefs = 0.05;
>   table SEX agegroup;
> run;
>
> I've not been successful in reproducing the standard errors with R,
> although they are very close. My code follows; what revisions do I need to
> make?
>
> Thanks,
> Mike L.
>
> # load estimates for verification
> pums_est <- read.csv("pums_estimates_12.csv")
> pums_est[,4] <- as.integer(gsub(",", "", pums_est[,4]))
>
> # load PUMS data
> pums_p <- read.csv("ss12pnh.csv")
> # convert sex and age group to factors
> pums_p$SEX <- factor(pums_p$SEX, labels = c("M","F"))
> pums_p$agegrp <- cut(pums_p$AGEP,
>                      c(0,5,10,15,20,25,35,45,55,60,65,75,85,100),
>                      right = FALSE)
>
> # create replicate-weight survey object
> library(survey)
> pums_p.rep <- svrepdesign(repweights = pums_p[207:286],
>                           weights = pums_p[,7],
>                           combined.weights = TRUE,
>                           type = "Fay", rho = 1 - 1/sqrt(4),
>                           scale = 1, rscales = 1,
>                           data = pums_p)
>
> # using type "JK1" with scale = 4/80 and rscales = rep(1,80)
> #   seems to produce the same results
>
> # total population by sex with SE's
> by.sex <- svyby(~SEX, ~ST, pums_p.rep, svytotal, na.rm = TRUE)
> round(by.sex[1,4:5])
> #     se1  se2
> # 33 1606 1606
> # compare results with Census
> pums_est[966:967, 5]
> #[1] 1610 1610
>
> # total population by age group with SE's
> by.agegrp <- svyby(~agegrp, ~ST, pums_p.rep, svytotal, na.rm = TRUE)
> round((by.agegrp)[15:27])
> #       se1  se2  se3  se4  se5  se6  se7  se8  se9 se10 se11 se12 se13
> #    33 874 2571 2613 1463 1398 1475 1492 1552 2191 2200  880 1700 1678
> # compare results with Census
> pums_est[968:980, 5]
> #  [1]  874 2578 2613 1463 1399 1476 1493 1555 2191 2200  880 1702 1684
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list