--- title: "Introduction to the `apm` Package" date: "`r Sys.Date()`" output: html_vignette: bibliography: apm_vignette_bib.bib link-citations: true vignette: > %\VignetteIndexEntry{Introduction to the `apm` Package} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown_notangle} editor_options: chunk_output_type: console --- ## Introduction The `apm` package implements *Averaged Prediction Models (APM)*, a Bayesian model averaging approach for controlled pre-post designs. These designs compare differences over time between a group that becomes exposed (treated group) and one that remains unexposed (comparison group). With appropriate causal assumptions, they can identify the causal effect of the exposure/treatment. In APM, we specify a collection of models that predict untreated outcomes. Our causal identifying assumption is that the model's prediction errors would be equal (in expectation) in the treated and comparison groups in the absence of the exposure. This is a generalization of familiar methods like Difference-in-Differences (DiD) and Comparative Interrupted Time Series (CITS). Because many models may be plausible for this prediction task, we combine them using Bayesian model averaging. We weight each model by its robustness to violations of the causal assumption. ## Methodology Overview Our identification framework begins with prediction and correction steps. First, we train a model on pre-intervention data to predict untreated outcomes in the post-intervention period. Then, we correct the treated group’s predictions using the comparison group’s post-intervention prediction errors, which adjusts for shared time-varying shocks. The identifying assumption is that, without the policy change, prediction errors would be equal (in expectation) across treated and comparison groups. We specify a collection of plausible models for this prediction task and compare across them using robustness to the causal assumption. Specifically, we quantify each model's differential prediction errors (i.e., differences between the treated and untreated group's prediction errors) during a series of pre-intervention validation periods. Taking the max as a summary across these periods, we consider models with smaller maximum differential prediction errors more robust. Then we apply Bayesian model averaging (BMA), weighting each model by its posterior probability of being the most robust. Taking an "informal Bayesian approach", we sample model parameters from a quasi-posterior [@gelmanhill2006, p. 140]. This is a multivariate Normal distribution with mean equal to the estimated parameters of the fitted models and a variance-covariance matrix that incorporates across-model correlations. For each parameter draw, we compute the models' differential prediction errors and take the max over the validation periods. Each model's weight is the proportion of draws in which it minimizes the maximum differential prediction error (i.e., is most robust). Finally, using the corrected predictions from our averaged model, we estimate the average treatment effect on the treated (ATT). ![Summary of the APM method](apm_summary/apm_summary.svg){width="100%"} For inference, we apply a fractional weighted bootstrap [@xuetal2020]. It takes into account uncertainty about the models' performance, but not the uncertainty in the BMA weights themselves, which would be computationally infeasible. Following @antonellietal2022, we estimate overall variance as the sum of two components: (1) sampling variance with fixed model uncertainty and (2) model uncertainty variance with fixed sampling uncertainty. Finally, we can also perform causal sensitivity analyses by scaling the models' maximum differential prediction error in the validation periods by a factor $M$. This enables sensitivity measures such as: - constructing sensitivity bounds for a particular $M$ as in @manskipepper2018 and @rambachanroth2023 - finding the value of $M$ that would reverse the sign of the causal effect - finding the value of $M$ that would lead the confidence interval to include 0 The package implements the APM methods via three key functions: - `apm_mod()` constructs candidate models that can predict untreated outcomes in both treated and comparison groups. - `apm_pre()` fits these candidate models to pre-treatment validation data and generates the BMA weights for each model. - `apm_est()` estimates the ATT, given the BMA weights, and constructs both statistical and causal bounds around it. ## Example: Estimating the Effect of Missouri's Gun Policy Change In this example, we apply APM to estimate the effect of Missouri’s 2007 repeal of its permit-to-purchase law on gun homicide rates [@websteretal2014; @hasegawaetal2019]. ```{r} library(apm) ``` ### Load Example Data The package provides an example dataset with pre- and post-policy homicide rates: ```{r} data("ptpdata", package = "apm") # Inspect the dataset head(ptpdata) ``` The dataset includes: - `state`: State name - `year`: Year of observation - `deaths`: The number of gun homicide deaths - `crude_rate`: Gun homicide rate per 100,000 - `age_adj_rate`: Gun homicide rate per 100,000 adjusted for age - `group`: Indicator for Missouri (1 = Missouri, 0 = comparison group) - `treat`: Indicator for Missouri in post-treatment year (2008+) (1 = treated, 0 = untreated) Note that observations with `year == 2008` are the average of a state's observations over all post-treatment periods (2008 - 2016). ### Define Candidate Models APM supports a range of model options: - `formula_list`: list of model formulas with outcome on left-hand side and predictors on right-hand side, e.g., `formula_list = crude_rate ~ 1`. - `family`: list of family specifications passed to `stats::glm()` when fitting models in `apm_pre()`; `"negbin"` can also be supplied to request negative binomial model with log link fit using `MASS::glm.nb()`. To see list of family specifications, run `?family`. - `lag`: vector of integers outcome lags to be used as predictors. For example, `lag = 3` means to include lag-1, lag-2, and lag-3 outcomes as predictors. Default is 0 (for no lags). - `diff_k`: vector of integers indicating outcome lags to be used as offsets. For example, `diff_k = 1` means prior time point's outcome will be included as offset, equivalent to using the outcome minus its corresponding lag as the model's outcome. Default is 0 for no lags. Any models with a `diff_k` value less than a `lag` value are removed automatically. When used with a family with log link, lags are automatically log-transformed; `apm_pre()` will return an error if non-positive values are present in the outcome. - `log`: logical vector indicating whether outcome should be log-transformed. Default is `FALSE` to use the original outcome. When `lag` or `diff_k` are greater than 0, outcome lags will also be log-transformed if `TRUE`. When family has log link and `diff_k` is greater than 0, lag in offset will be log transformed. - `time_trend`: vector of integers indicating powers to be included in a time trend. For example, `time_trend = 2` means to include as predictors time variable and its square. A value of 0 (the default) means continuous time is not included as predictor. - `fixef`: logical vector indicating whether to include unit fixed effects as predictors. Default is `FALSE`. These lists of model options are combined factorially to create a collection of candidate models. In this example, we specify two options for lags (no lag and lag-1), two options for outcome differences (no offset and immediate prior outcome), two options for the outcome scale (original and log transformed), and two options for time trends (no time trend and linear time trend), to get a set of candidate models: ```{r} models <- apm_mod(formula_list = crude_rate ~ 1, family = "gaussian", lag = 0:1, diff_k = 0:1, log = c(TRUE, FALSE), time_trend = 0:1, fixef = TRUE) ``` This produces `r length(models)` candidate models, all of which use the `"gaussian"` family (with an identity link function). This is fewer than the full factorial combination because of the embedded logic for combining outcome differences and lags (see above). ### Fit Candidate Models to Pre-Treatment Data We now fit all `r length(models)` models to pre-treatment data. For each model and each pre-treatment validation period, we compute the observed difference in average prediction errors between treated and comparison groups. From these differences in average prediction errors, we compute the Bayesian model averaging (BMA) weights that are eventually passed to `apm_est()` for estimation of the average effect of treatment on the treated (the ATT). The function `apm_pre()` does the model fitting. It requires a data frame that contains a group indicator variable and a time variable. We specify `1999:2007` as the validation years in which each model will be. Each validation year's fit will be based on all data prior to that year. Therefore, we set the first validation period to 1999 so that even in the first validation year, we can train the models on five years of data (i.e., `1994:1998`). We specify the number of quasi-posterior draws using the `nsim = 1000` argument; more draws gives a better approximation to the model posterior, but can slow down the computation. ```{r results = "hide"} # Set seed for reproducibility: ensures random sampling from # multivariate Normal produces same results each time code is run set.seed(098556947) fits <- apm_pre(models = models, data = ptpdata, group_var = "group", time_var = "year", unit_var = "state", val_times = 1999:2007, nsim = 1000) ``` We can view the largest average differential prediction error for each model and the BMA weights given to each model using `summary()` on the returned object: ```{r} summary(fits) ``` We can plot the simulation-based posterior distribution of which model is most robust. The probabilities are the proportions of simulations in which each model is the winner. ```{r, fig.model_weights, fig.width = 8, fig.height = 5, out.width = "\\textwidth", fig.align = "center", fig.cap = "Bayesian Model Averaging (BMA) Weights for Model Selection."} plot(fits, type = "weights") ``` We can see the differential prediction errors in each model and year. Here, the winning model is highlighted: it is the model that includes a lag-1 outcome as a predictor and log-transforms the outcome. The maximum differential prediction error was observed in 2005. ```{r, fig.width = 8, fig.height = 8, out.width = "\\textwidth", fig.align = "center"} plot(fits, type = "errors") ``` The plot below shows the predictions from this model in each validation period. The observed outcomes are displayed as points and the predicted outcomes as lines. ```{r, fig.width = 6, fig.height = 5, out.width = "\\textwidth", fig.align = "center"} plot(fits, type = "predict") ``` Finally, we can also show the winning model's corrected predictions, that is, after incorporating the prediction error in the control group To observe the corrected predictions, we can set `type = "corrected"`, which is the prediction for the treated group, corrected for the comparison group's prediction error. ```{r, fig.width = 6, fig.height = 5, out.width = "\\textwidth", fig.align = "center"} plot(fits, type = "corrected") ``` ### Estimation and Inference To estimate the ATT and conduct inference, we feed the output of a call to `apm_pre()` into `apm_est()`. The `M` argument is the sensitivity parameter for set identification, which by default is set to `M = 0`. When `M` is set to a value greater than 0, `apm_est()` will return estimates of the lower and upper bounds of the ATT. These bounds can incorporate both the uncertainty due to possible causal violations and sampling uncertainty. The `R` argument is the number of bootstrap iterations used to estimate the sampling variance, holding model uncertainty fixed. ```{r } est <- apm_est(fits = fits, post_time = 2008, M = 1, R = 1000, all_models = TRUE) ``` To examine the estimates and uncertainty bounds, we run the following. The `level = 0.95` argument specifies the statistical confidence level; to ignore sampling uncertainty, set `level = 0`. ```{r} summary(est, level = 0.95) ``` The standard error is the square root of the sum of two variances: (1) the estimated sampling variance holding model uncertainty fixed and (2) estimated model uncertainty variance holding the sampling uncertainty fixed. For the `ATT` row, the `CI low` and `CI high` outputs are the lower and upper confidence bounds for the ATT. The `CI low` output for the `M = 1` row is the lower confidence bound of the ATT's lower sensitivity bound. The `CI high` output for the `M = 1` row is the upper confidence bound of the ATT's upper sensitivity bound. The figure below shows the estimated ATT under each model plotted against the maximum absolute difference in average prediction errors for that model. The model with the smallest maximum absolute difference in average prediction errors is displayed in red. The size of the points correspond to the BMA weights. Small variation in the ATT estimates (y axis) across values of maximum absolute differences in prediction errors (x axis) suggests that we do not face a stark trade-off between model plausibility and robustness. ```{r, fig.width = 6, fig.height = 5, out.width = "\\textwidth", fig.align = "center"} plot(est) ``` ### Sensitivity Analysis We can also apply a sensitivity analysis to increasing values of `M`. For example, below we estimate the ATT's bounds under values of `M` from 1 to 2 in increments of 0.25. ```{r} summary(est, M = seq(from = 1, to = 2, by = 0.25)) ``` This output shows that our 95% confidence interval for the ATT's lower bound excludes 0 when `M = 1.25`, but not when `M = 1.5`. To find the exact changepoint value of `M`, we can run the following. ```{r} robustness_bound(est, level = 0.95) ``` We can also run `robustness_bound()` when the level is `level = 0`, which will give us the value of `M` in which the sensitivity bound (not statistical confidence bounds) begin to bracket 0. ```{r} robustness_bound(est, level = 0) ``` As we would expect, the changepoint value of `M` is greater when `level = 0`. ## Interpretation of Results The BMA point estimate (`M = 0`) is `r round(x = est$BMA_att[1], digits = 2)`, with a standard error of `r round(x = sqrt(est$BMA_var[1]), digits = 2)`, yielding a 95% confidence interval of [`r round(x = summary(est)[1,3], digits = 2)`, `r round(x = summary(est)[1,4], digits = 2)`]. This suggests that Missouri’s repeal of its permit-to-purchase law increased the state’s gun homicide rate by `r round(x = summary(est)[1,3], digits = 2)` to `r round(x = summary(est)[1,4], digits = 2)` per 100,000 people. Given Missouri’s 2007 homicide rate of `r round(x = ptpdata$crude_rate[ptpdata$state == "Missouri" & ptpdata$year == 2007], digits = 2)` per 100,000 people, the estimated increase of `r round(x = est$BMA_att[1], digits = 2)` represents a `r round(x = (est$BMA_att[1]/ptpdata$crude_rate[ptpdata$state == "Missouri" & ptpdata$year == 2007]) * 100, digits = 0)`% rise. The changepoint value of M for the BMA estimator is `r round(x = robustness_bound(est, level = 0), digits = 2)`. That is, if differential prediction errors were nearly twice what were seen in the validation periods, the point estimate would no longer indicate an increase in the gun homicide rate. Additionally, the lower bound estimator’s 95% confidence interval includes zero when M reaches `r round(x = robustness_bound(est, level = 0.95), digits = 2)`. That is, at this multiplier of the differential prediction errors seen in the validation period, our statistical uncertainty bounds around the effect estimate would begin to include 0. ## Conclusion The `apm` R package implements Averaged Prediction Models (APM), a unified framework for causal inference in controlled pre-post settings. APM generalizes a broad class of prediction-based methods by combining outcome prediction with error correction using a comparison group. The package also incorporates Bayesian Model Averaging to select the most robust model based on pre-period data. ```{r, include = FALSE} worst_mod_fit <- apm_pre(models = models[which.max(apply(abs(fits$pred_error_diffs), 2, max))], data = ptpdata, group_var = "group", time_var = "year", unit_var = "state", val_times = 1999:2007, nsim = 10) worst_mod_est <- apm_est(fits = worst_mod_fit, post_time = 2008, M = 1, R = 10, all_models = TRUE, level = 0) ``` Through an application to Missouri’s 2007 permit-to-purchase law repeal, our results suggest that a lagged dependent variable model with unit fixed effects on the log scale was the most robust choice, leading to an estimated increase of `r round(x = est$BMA_att[1], digits = 2)` homicides per 100,000 people. Sensitivity analysis indicates that for the estimated effect to be indistinguishable from zero, assumption violations would need to exceed `r round(x = robustness_bound(est, level = 0), digits = 2)` times the worst pre-period discrepancies, compared to as low as `r round(x = robustness_bound(worst_mod_est, level = 0), digits = 2)` under single-model approaches. Built on a unified identification framework, APM offers a flexible, data-driven approach to causal inference in controlled pre-post settings. The `apm` package prioritizes model averaging and robustness over assuming a single “correct” model while efficiently accounting for both sampling and model uncertainty. This ensures that researchers can achieve greater flexibility in model selection while maintaining rigorous and principled inference. For more details, see: - [GitHub Repository](https://github.com/ngreifer/apm) - [Paper on APM Methodology](https://static1.squarespace.com/static/5d54a19a5a1edf0001ea677a/t/67925707ef4f220a4827ce23/1737643784263/apm_preprint.pdf) ------------------------------------------------------------------------ ## References ::: {#refs} ::: ```{r include = FALSE} # Hide session info in vignette sessionInfo() ```