cpsvote: A Social Science Toolbox for Using the Current Population Survey’s Voting and Registration Supplement

cpsvote helps you work with data from the Current Population Survey’s (CPS) Voting and Registration Supplement (VRS), published by the U.S. Census Bureau and Bureau of Labor Statistics. This high-quality, large-sample survey has been conducted after every federal election (in November of even years) since 1964, surveying Americans about their voting practices and registration. The raw data, archived by the National Bureau of Economic Research, is spread across several fixed-width files with different question locations and formats.

This package consolidates common questions and provides the data in a structure that is much easier to work with and interpret, since much of the basic factor recoding has already been done. We also calculate alternative sample weights based on demonstrated changes in non-response bias over the decades, recommended by several elections researchers as a best practice. Documentation of this reweighting is provided in vignette("voting").

We have provided access to VRS data from 1994 to 2018, and anticipate updating the package when 2020 data becomes available.

Installing and Loading the Package

Version 0.1 is on CRAN!

install.packages('cpsvote')
library(cpsvote)

You can also install the development version from our GitHub repository.

remotes::install_github("Reed-EVIC/cpsvote")
library(cpsvote)

Basic Use (AKA Tips if You Don’t Like Reading Documentation)

We have written several functions to transform the VRS from its original format into a more workable structure. The easiest way to access the data is with the cps_load_basic() command:

# Load All Years
# May take some time to download and process files the first time! 
cps <- cps_load_basic()  
# Just load 2006 and 2008
cps <- cps_load_basic(years = c(2006, 2008))

This will load the prepared VRS data into your environment as a tibble called cps. The first time you try to load a given year of data, the raw data file will be downloaded to your computer (defaulting to the relative path “./cps_data”). This can take some time depending on your internet speeds. In future instances, R will just read from the data files that have already been downloaded (defaulting to the same “cps_data” folder), as long as you correctly specify where these are stored. See ?cps_allyears_10k for a description of the columns and fields that cps_load_basic() outputs.

We recommend using a single R project for your CPS analysis where these files can be stored (this will work with the default options), or storing one set of CPS files in a steady location and specifying this absolute file path each time you load in the data. If you specify a location that does not have the correct files, these functions will attempt to re-download the data from NBER, which can take up noticeable time and storage space.

We have also included a 10,000 row sample of the full VRS data, which comes with the package as cps_allyears_10k. This is particularly useful for planning out a given analysis before you download the full data sets.

library(dplyr)
data("cps_allyears_10k")

cps_allyears_10k %>%
  select(1:3, VRS_VOTE:VRS_REG, VRS_VOTEMETHOD_CON, turnout_weight) %>%
  sample_n(10)

The CPS has survey weights that are necessary to calculate accurate estimates about the US population. Two R packages that work with survey weighting are survey and srvyr (a tidyverse-compatible wrapper for survey). You can see more examples and details on weighting in vignette("voting"), but here is one example of using srvyr to calculate state-level voter turnout among eligible voters in 2018.

library(srvyr)

cps18_weighted <- cps_load_basic(years = 2018, datadir = here::here('cps_data')) %>%
  as_survey_design(weights = turnout_weight)

turnout18 <- cps18_weighted %>%
  group_by(STATE) %>%
  summarize(turnout = survey_mean(hurachen_turnout == "YES", na.rm = TRUE))

head(turnout18, 10)

These estimates follow closely Dr. Michael McDonald’s estimates of turnout among eligible voters in the November 2018 General Election. For a detailed examination of how non-response bias has affected the use of CPS for estimating turnout, see vignette("voting"). We thank the U.S. Elections Project at the University of Florida for the turnout estimates.

Advanced Use

In addition to the basic function listed above, you can customize several steps in the process of reading in the VRS data. If you’ve worked with the CPS before, you may already have some code to read in analyze this survey data. We still hope that this package can help you organize your workflow or ease some of the more tedious steps necessary to work with the CPS.

Be sure to refer to the CPS documentation files when working with alternative versions of the VRS data. We have included the function cps_download_docs() to provide the documentation versions that match this data. These are all in PDF format (and several are not text-based), so they are not easy to search through.

cps_load_basic() is a wrapper for several constituent steps that have their own parameters and assumptions. We’ve detailed the changes made to get from the raw data file to the cleaned file in vignette("add-variables").

cps_download_data(path = "cps_data",
                  years = seq(1994, 2018, 2))
cps_download_docs(path = "cps_data",
                  years = seq(1994, 2018, 2))

cps_read(years = seq(1994, 2018, 2),
         dir = "cps_data",
         cols = cpsvote::cps_cols,
         names_col = "new_name",
         join_dfs = TRUE) %>%
    cps_label(factors = cpsvote::cps_factors,
              names_col = "new_name",
              na_vals = c("-1", "BLANK", "NOT IN UNIVERSE"),
              expand_year = TRUE,
              rescale_weight = TRUE,
              toupper = TRUE) %>%
    cps_refactor(move_levels = TRUE) %>%
    cps_recode_vote(vote_col = "VRS_VOTE",
                    items = c("DON'T KNOW", "REFUSED", "NO RESPONSE")) %>%
    cps_reweight_turnout()

You can use different combinations of these functions to customize which CPS data is read in. For example, this code would load the 2014 VRS data with the original column names and numeric data.

cps14 <- cps_read(2014, names_col = "cps_name")

You can then apply factor labels to this data.

cps14_lab <- cps_label(cps14, names_col = "cps_name")

Note that some features (like cps_refactor()) won’t work on certain customized versions of the data, because they are relatively hard-coded based on specific column names. For example, correcting “HIPSANIC” to “HISPANIC” only works if you know which column represents the Hispanic flag. Feel free to take the code from functions like this and adapt based on your own column names.

Examples, Background Reading, and Data Sources

Acknowledgements

The cpsvote package was originally created at the Early Voting Information Center at Reed College. We are indebted to support from the Elections Team at the Democracy Fund and Reed College for supporting the work of EVIC.