The Omitted Variable Tool (`OVtool`

) package was designed to assess the sensitivity of research findings to omitted variables when estimating causal effects using propensity score (PS) weighting. This package includes graphics and summary results that will enable a researcher to quantify the impact an omitted variable would have on their results. Burgette et al. (2021) describe the methodology behind the primary function in this package, `ov_sim()`

. This document presents syntax for the implementation of the `ov_sim()`

function and provides an example of how to interpret the packages’ graphical output.

This package is useful in a wide range of applications where researchers want to analyze how sensitive their research findings are to unobserved confounders that were not included in their propensity score and outcome models. It will estimate the potential impact of the unobserved confounders on both the estimated treatment or exposure effects as well as on the statistical significance of an analysis.

This package is demonstrated using a synthetic data set that was derived from a large-scale observational study on youth in substance use treatment. More specifically, it contains a subset of measures from the Global Appraisal of Individual Needs (GAIN) biopsychosocial assessment instrument (Dennis, Titus et al. 2003) from sites that administered two different types of substance use disorder treatments (treatment “A” and treatment “B”). This data set represents 4,000 adolescents. The main goal of this analysis is to understand the relative effectiveness of Treatment A versus Treatment B on mental health outcomes and to assess the potential for an omitted variable to bias the findings. To create our synthetic data set, we used an R package called “synthpop : Bespoke Creation of Synthetic Data in R” (Nowok et al. 2016).

In our synthetic dataset, there are 2,000 adolescents in each treatment group, as indicated in the `treat`

variable. Within this data set there are 28 variables on substance use disorder and mental health outcomes. Variables that end in `_0`

, `_3`

, and `_6`

were collected from interviews at baseline, 3-, and 6-months post-baseline, respectively.

For this tutorial we are interested in the mental health outcome, `eps7p_3`

, emotional problem scale (eps) recorded at three months. In particular, we want to estimate the relative effectiveness of treatment A versus B on reducing emotional problems in youth. `eps7p_3`

ranges from zero to one, where higher values of EPS indicate more emotional problems. Past research has concluded there are many variables that may influence treatment assignment and an adolescents’ emotional problems (Griffin et al., 2020). We utilize a selection of these variables in our propensity score and outcome models and provide their definitions below.

We begin by loading the package. *Note: installing OVtool may require a large number of other package installations if they are not already installed on your machine. We recommend using a fresh R session prior to installation.*

`library(OVtool)`

We can load the synthetic dataset and make our treatment variable a binary indicator of 0’s and 1’s. The OVtool requires the user to specify the treatment indicator as a numeric vector of 1’s and 0’s, where 1 indicates the target treatment group.

```
set.seed(24)
data(sud); sud = data.frame(sud)
$treat = ifelse(sud$treat == "A", 1, 0) sud
```

The relevant variables in this analysis are:

**Treatment indicator**`treat`

: treatment program, a binary indicator where 1 and 0 represent Treatment “A” and Treatment “B”, respectively.**Outcome of interest**`eps7p_3`

: emotional problem scale at 3-months, a continuous variable ranging from 0 to 1. Higher values represent more emotional problems.`eps7p_0`

: emotional problem scale at baseline, a continuous variable ranging from 0 to 1. Like the outcome, higher values represent more emotional problems.`sfs8p_0`

: substance frequency scale 8-item version at baseline, a continuous variable ranging from 0 to 77.5. Higher values represent more frequencies of substance use.`sati_0`

: substance abuse treatment index at baseline, a continuous variable ranging from 0 to 110. Higher values indicate prior involvement in substance use treatment.`ada_0`

: adjusted days abstinent at baseline, a continuous variable ranging from 0 to 90 days.`recov_0`

: indicates whether the adolescent was in recovery at baseline, where 1 is in recovery and 0 is not in recovery.`tss_0`

: traumatic stress scale at baseline, a continuous variable ranging from 0 to 13. Higher values indicate more problems related to memories of traumatic events.`dss9_0`

: depressive symptom scale at baseline, a continuous variable ranging from 0 to 9. Higher values indicate higher levels of depressive symptoms.

For additional details on these variables, please refer to Dennis et al. (2003).

In the next two sections we showcase how our method works with a continuous outcome and binary treatment indicator. In the first analysis, we focus on how to use the `OVtool`

when estimating the average treatment effect (ATE) for the population. In the second, we focus on estimating the average treatment effect on the treated (ATT) population. **Note: One can also use the OVtool to assess sensitivity of findings for binary outcomes. With a binary outcome, the OVtool utilizes residuals linear probability model using observed covariates to generate the empirical CDF. If the residuals contain many ties, other types of sensitivity analyses may be more appropriate.**

The `OVtool`

can either take a vector of weights estimated using any method or a propensity score (`ps`

) object produced by `TWANG`

(Ridgeway et al., 2021). The chunk of code below demonstrates how to specify your propensity score model and generate propensity score weights using the `ps`

function from `TWANG`

. If you already have a column of weights in your dataset, you can skip to step 2.

```
## Create Formula
= as.formula(treat ~ eps7p_0 + sfs8p_0 + sati_0 + ada_0 + recov_0 +
my_formula + dss9_0)
tss_0
## Get weights
# library(twang)
<- twang::ps(my_formula, data = sud, estimand = 'ATE', booster = "gbm",
ps.twang stop.method = "ks.max", verbose=F, ks.exact = T)
```

The following line of code produces a balance table that demonstrates `TWANG`

does a reasonable job of balancing. There are additional diagnostics we could check to ensure we have sufficient balance, but we move on without diving in further because the purpose of this tutorial is to showcase `OVtool`

. See Ridgeway et al. for further information on balance diagnostics.

```
# Check Balance
::bal.table(ps.twang) twang
```

```
#> $unw
#> tx.mn tx.sd ct.mn ct.sd std.eff.sz stat p ks ks.pval
#> eps7p_0 0.256 0.196 0.219 0.192 0.187 5.937 0.000 0.103 0.000
#> sfs8p_0 11.253 13.134 10.571 12.162 0.054 1.703 0.089 0.045 0.032
#> sati_0 8.233 22.128 2.145 10.658 0.345 11.088 0.000 0.121 0.000
#> ada_0 48.748 33.400 54.236 32.454 -0.166 -5.271 0.000 0.081 0.000
#> recov_0 0.246 0.431 0.240 0.427 0.015 0.479 0.632 0.006 1.000
#> tss_0 2.277 3.525 1.924 3.115 0.106 3.365 0.001 0.043 0.050
#> dss9_0 2.750 2.604 2.638 2.492 0.044 1.390 0.165 0.023 0.666
#>
#> $ks.max.ATE
#> tx.mn tx.sd ct.mn ct.sd std.eff.sz stat p ks ks.pval
#> eps7p_0 0.239 0.194 0.233 0.192 0.028 0.880 0.379 0.021 0.765
#> sfs8p_0 10.885 12.655 10.727 12.372 0.012 0.392 0.695 0.012 0.998
#> sati_0 5.345 17.937 4.198 15.691 0.065 1.797 0.072 0.023 0.662
#> ada_0 51.455 32.894 52.377 32.898 -0.028 -0.858 0.391 0.020 0.823
#> recov_0 0.248 0.432 0.238 0.426 0.023 0.716 0.474 0.010 1.000
#> tss_0 2.099 3.358 2.065 3.279 0.010 0.313 0.754 0.012 0.999
#> dss9_0 2.675 2.556 2.706 2.535 -0.012 -0.381 0.703 0.012 0.999
```

The last step needed prior to moving onto the specification of your outcome model is to attach the weights as a column in your dataset. We extract the weights produced from `ps`

in the line below.

```
# Get weights (not needed if user inserts a ps object in OVtool::outcome_model)
$w_twang = ps.twang$w$ks.max.ATE sud
```

The next step is to estimate the observed treatment effect. The `outcome_model`

function in `OVtool`

automatically uses the `svydesign`

and `svyglm`

functions from the `survey`

package in R (survey) and formats and returns a list of components that are needed throughout step 3.

There are two options the researcher can take to input the relevant information to get their outcome results using `outcome_model()`

.

- Input a
`ps.object`

from`TWANG`

and a`stop.method`

(e.g.`"ks.max"`

) or - Input a data frame containing the
`data`

used, the column name representing the weights,`weights`

, and the column name representing the treatment indicator,`treatment`

.

The analyst must also provide:

- a column name representing the outcome,
`outcome`

- a vector of covariates to be included in the final outcome model,
`model_covariates`

- the estimand of interest,
`estimand`

```
# Run Models -- first standardize outcome
$eps7p_3_std = sud$eps7p_3/sd(sud$eps7p_3)
sud
# Run outcome model (function in OVtool that calls svyglm from survey)
= outcome_model(ps_object = NULL,
results stop.method = NULL,
data = sud,
weights = "w_twang",
treatment = "treat",
outcome = "eps7p_3_std",
model_covariates = c("eps7p_0", "sfs8p_0",
"sati_0", "ada_0",
"recov_0", "tss_0",
"dss9_0"),
estimand = "ATE")
summary(results$mod_results)
```

```
#>
#> Call:
#> svyglm(formula = formula, design = design_u)
#>
#> Survey design:
#> survey::svydesign(ids = ~1, weights = ~w_orig, data = data)
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.4235011 0.0657531 6.441 1.33e-10 ***
#> treat 0.0712001 0.0278454 2.557 0.0106 *
#> eps7p_0 1.7397954 0.1166266 14.918 < 2e-16 ***
#> sfs8p_0 0.0015028 0.0020843 0.721 0.4709
#> sati_0 0.0028393 0.0012348 2.299 0.0215 *
#> ada_0 -0.0005808 0.0007725 -0.752 0.4522
#> recov_0 -0.0704673 0.0317912 -2.217 0.0267 *
#> tss_0 0.0329637 0.0069454 4.746 2.15e-06 ***
#> dss9_0 0.0540816 0.0080153 6.747 1.72e-11 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for gaussian family taken to be 0.7074412)
#>
#> Number of Fisher Scoring iterations: 2
```

The outcome model results show an adjusted treatment effect estimate that accounts for confounding from observed covariates between youth in the two treatment programs (A = 1 and B = 0). From the results, we observe that the estimated treatment effect is 0.07 (p = 0.011), whereby youth receiving treatment A have slightly higher emotional problems at the 3-month follow-up than youth in treatment program B.

At this stage, researchers may ask themselves if the observed effect in Step 2 is real and how sensitive to unobserved confounders it is. Our tool is used to help answer this logical next step question.

The following snippet of code presents the main function in `OVtool`

: `ov_sim()`

. This function requires results from `outcome_model()`

plus additional parameters including:

`plot_covariates`

: a vector of column names potentially representing the covariates used to produce the analysts’ propensity score weights (these may or may not be the same as the list of covariates used for the outcome model)`es_grid`

: a vector on an effect size scale representing the range of association between the unobserved confounder (omitted variable) and the treatment indicator over which to run the simulations.`rho_grid`

: a vector of absolute correlations to simulate over. These correlations represent the range of absolute correlations between the omitted variable and the outcome`n_reps`

: the number of repetitions represents the number of times an unobserved confounder is simulated at each effect size and rho combination.**The package defaults to 50. We find that fifty repetitions is sufficient in most cases, but the analyst may need to increase the number of repetitions if the plotted contours are not sufficiently smooth.**

Optional arguments include:

`progress`

: whether or not the function progress should print to screen. The default value is TRUE. If the user does not want the output to print to the console, they should set this parameter to FALSE.`add`

: The default value is FALSE. This is only set to TRUE if the user is running additional repetitions after the first call to`ov_sim`

. It should be used if the original value of`n_reps`

does not yield smooth contours.`sim_archive`

: The default value is NULL and is only used if`add_reps`

is called. This parameter should never be defined by the user.

The `ov_sim`

function does not require the user to specify vectors for `es_grid`

and `rho_grid`

. If the user specifies `NULL`

for both, the tool will automatically suggest appropriate values. We define an effect size value in `es_grid`

to show the strength of the relationship between the simulated unobserved covariate (U) and the treatment group indicator; it is defined as the standardized mean difference in U for the treatment A and treatment B groups. Typical rules of thumb for effect sizes (Cohen’s D) follow such that effect sizes greater than 0.2 would be considered small, 0.5 would be moderate and 0.8 would be large (Cohen, J., 1988). We define a particular value of rho specified in `rho_grid`

as the absolute correlation the unobserved covariate (U) has with the outcome of interest, with larger values indicating stronger relationships between U and the outcome.

Please see Burgette et al. (2021) for additional details on the theory behind `OVtool`

.

```
# Run OVtool (with weights (not a ps object))
= ov_sim(model_results=results,
ovtool_results_twang plot_covariates=c("eps7p_0", "sfs8p_0",
"sati_0", "ada_0",
"recov_0", "tss_0",
"dss9_0"),
es_grid = NULL,
rho_grid = seq(0, 0.40, by = 0.05),
n_reps = 50,
progress = TRUE,
add = FALSE,
sim_archive = NULL)
```

The grid values are not required; if `es_grid`

and/or `rho_grid`

are set to `NULL`

, the tool will calculate reasonable values to simulate over. For this tutorial we set `es_grid`

to `NULL`

. The tool then iterated over all observed covariate associations with the treatment indicator on an effect size scale to derive a reasonable range.

The function, `ov_sim`

, produced a warning saying “You specified a rho grid whose maximum value is less than the maximum absolute correlation at least one observed covariate has with the outcome. The rho grid was automatically expanded to include all `plot_covariates`

specified in the relevant graphics. If you want the rho grid range to remain from 0 to 0.4 then you must exclude the following variables from the `plot_covariates`

argument: `eps7p_0`

, `tss_0`

, `dss9_0`

.” The grid was expanded to ensure all `plot_covariates`

could be seen on the contour plot. If the user does not want the grid expanded, they can leave out the observed covariates used in the propensity score model that have an absolute correlation with the outcome that is greater than the maximum rho value the user specifies (in this example: `eps_7p_0`

, `dss9_0`

, and `tss_0`

).

There are a few methodological assumptions that are important for an analyst to understand.

First, when generating the omitted variable (U), the empirical cumulative distribution function (CDF) for the residuals within each treatment is used. The residuals are generated from the observed outcome model. When multiple ties are present, the tool handles this by randomly ranking multiple ties. For example, imagine a scenario where you have ten observations (shown in rank order) and the first two observations are 0s. The tool will randomly assign the rank of the two zeroes and assign the first observation with 0.1 and the second observation with 0.2 as opposed to assigning the two zeroes a value of 0.2. This process is repeated for each effect size and rho combination `n_reps`

times.

Another key assumption this method draws upon is that the omitted variable is independent from all observed covariates included in the propensity score model.

To visualize our results, the `plot.ov`

function can be called to produce three graphics. The main arguments to `plot.ov`

are as follows:

`x`

: the object returned from the call to`ov_sim`

`col`

: if specified as “color” a color graphic will be produced and if specified as “bw” a black and white graphic will be produced. If and only if`print_graphic`

is set to “1”, the “col” option will produce a heat map`p_contours`

: a vector of p-value contours plotted. The default is 0.01, 0.05, and 0.1. We only recommend changing from the default if the p-value for the raw effect is close to one of these values. This parameter is only used when`print_graphic`

equals “2” or “3”.`print_graphic`

: takes values “1”, “2”, or “3”, representing Figures 1, 2, and 3 described below.The first graphic (Figure 1) plots the treatment effect contours without covariate labels. This graphic can be produced by specifying

`print_graphic`

to “1”. If the user specifies the parameter`col`

as`"color"`

, the contours will overlay a colored heat map.The second graphic (Figure 2) plots the treatment effect contours in black and the p-value contours in red. The p-value contours can be specified with

`p_contours`

. The default values are 0.01, 0.05, and 0.1.The third graphic (Figure 3) plots the treatment effect contours with the p-value contour overlaid and covariate labels (i.e. the column names submitted to

`plot_covariates`

) plotted by their raw rho and effect size.

The following code snippet will produce Figure 1.

`plot.ov(ovtool_results_twang, print_graphic = "1", col = "bw")`

The grid, as shown by the x-axis and y-axis in Figure 1 presents the effect size and rho, respectively.

The y-axis in Figure 1 represents the unobserved confounder’s absolute correlation with the outcome and the x-axis is the association between the unobserved confounder and the treatment indicator on an effect size scale. The black lines represent treatment effect contours that run along the grid. The PS weighted treatment effect of Treatment A versus Treatment B equals 0.071 and is significant with a p-value equal to 0.011. However, looking at this graphic alone will not give us an idea of how sensitive statistical significance is.

`plot.ov(ovtool_results_twang, print_graphic = "2", col = "color")`

```
#> [1] "NOTE: Covariates with absolute correlation with
outcome greater than 0.4: eps7p_0 (Actual: 0.509), tss_0
(Actual: 0.423), dss9_0 (Actual: 0.420)"
```

Figure 2 is a different variation of Figure 1, but now adds p-value contours. This graphic will allow the user to see what treatment effect will switch the significance level at critical p-values (e.g., 0.05). This graphic will now give the user an idea of how sensitive both the treatment effect estimate and the statistical significance are. The analyst can specify what red p-value thresholds they want plotted with the `p_contours`

argument.

`plot.ov(ovtool_results_twang, print_graphic = "3", col = "color")`

```
#> [1] "NOTE: Covariates with absolute correlation with
outcome greater than 0.4: eps7p_0 (Actual: 0.509), tss_0
(Actual: 0.423), dss9_0 (Actual: 0.420)"
```

Figure 3 adds a final dimension (observed covariate labels) to Figure 2. Similar to Figures 1 and 2, plotted at the top of the figure margin is the ps weighted treatment effect (0.071) and associated p-value of 0.011. The solid black contours represent the treatment effect contour lines and the red lines (sometimes dashed) represent the p-value thresholds. The key on the right side of the graphic shows where various p-value cutoff lines are, including p = 0.05. The blue points on the plot represent the observed covariate correlations with the outcome (y-axis) and effect size associations with the treatment indicator (x-axis). For instance, `ada_0`

and the outcome have approximately a 0.18 absolute correlation with the emotional problem scale at three months and an absolute association of approximately 0.17 effect size difference between the two treatment groups (magnitude of its relationship with the treatment indicator). In this case, if an unobserved confounder had a relationship with the outcome (`eps7p_3`

) and the treatment indicator (`treat`

) that was equivalent to `ada_0`

, `tss_0`

, `eps7p_0`

, `sati_0`

, or `dss9_0`

, then the researcher would conclude that their results would likely be sensitive to an unobserved confounder at a p-value threshold of 0.05. All of these observed relationships are sensitive at the 0.10 p-value threshold other than `dss9_0`

. If the blue points all existed in contours greater than the 0.05 p-value contour, then unobserved confounders with similar associations would retain the significant effect and allow the user to conclude that the results are reasonably robust. Overall, the graphic above showcases an example where findings could be highly sensitive to the effects of an unobserved confounder.

*Note: When the outcome model shows a significant effect for all observed covariates, regardless treatment effect direction, we force the sign of the magnitude to go with the direction of the significant effect. The blue points are meant to give the analyst an idea of possible associations (using observed covariates as an indicator) that would change the interpretation of their results.*

These results were produced with 50 simulations (`n_reps`

) of the unobserved confounder. If the contours are not smooth or the user wants to add simulations, they can call `add_reps`

and specify the number (`more_reps`

) of additional simulations. Once this is completed, a user can recreate the plots. Example code to add repetitions:

```
# If you want to add repetitions, run the following line.
# ovtool_results_twang = add_reps(OVtool_results = ovtool_results_twang,
# model_results = results,
# more_reps = 30)
#
# Recreate Graphic
# plot.ov(ovtool_results_twang, print_graphic = "1", col = "bw")
```

Finally, we can interpret this graphic by running the summary command on the `ov`

object:

`summary.ov(object = ovtool_results_twang, model_results = results)`

```
#> [1] "Recommendation for reporting the sensitivity
analyses"
#> [1] "The sign of the estimated effect is expected to
remain consistent when simulated unobserved confounders
have the same strength of associations with the treatment
indicator and outcome that are seen in 6 of the 7 observed
confounders. In the most extreme observed case in which the
sign changes, the estimated treatment effect shifts from
0.071 to -0.025. The sign of the estimate would not be
expected to be preserved for unobserved confounders that
have the same strength of association with the treatment
indicator and outcome as eps7p_0."
#> [1] "Statistical significance at the 0.05 level is
expected to be robust to unobserved confounders with
strengths of associations with the treatment indicator and
outcome that are seen in 2 of the 7 observed confounders.
In the most extreme observed case, the p-value would be
expected to increase from 0.011 to 0.372. Significance at
the 0.05 level would not be expected to be preserved for
unobserved confounders that have the same strength of
association with the treatment indicator and outcome as
eps7p_0, sati_0, ada_0, tss_0, dss9_0."
```

**The OVtool gives a recommendation on how to report findings regarding the direction of the treatment effect and statistical significance. An analyst could take the results produced by summary.ov() and adapt the desciption to be included a manuscript.** The tool produces these results by iterating through each observed confounder the user specified in the

`plot_covariates`

argument to `ov_sim`

and runs the simulation for those exact effect size and rho combinations to produce the treatment effect and p-value. It then determines which observed relationships would produce a treatment effect that changes direction and p-value that crosses over the 0.05 threshold when a significant raw effect is observed.In the next section, we will show how our method works with the average treatment effect on the treated (ATT) using a continuous outcome.

Perhaps the user is interested in the average treatment effect on the treated. The tool operates very similarly in the ATT setting. We will briefly show the steps needed to run the ATT version of the tool for continuous treatments below. Largely the only difference between ATE and ATT is the user must specify the correct estimand when (1) generating their weights and (2) running their outcome model.

```
## Create formula - here we use same formula as in the ATE case.
= as.formula(treat ~ eps7p_0 + sfs8p_0 + sati_0 + ada_0 + recov_0 +
my_formula + dss9_0)
tss_0
# Propensity score weights:
<- twang::ps(my_formula, data = sud, estimand = 'ATT',
ps.twang_att booster = "gbm", stop.method = "ks.max",
verbose=F, ks.exact = T)
::bal.table(ps.twang_att) twang
```

```
#> $unw
#> tx.mn tx.sd ct.mn ct.sd std.eff.sz stat p ks ks.pval
#> eps7p_0 0.256 0.196 0.219 0.192 0.185 5.937 0.000 0.103 0.000
#> sfs8p_0 11.253 13.134 10.571 12.162 0.052 1.703 0.089 0.045 0.032
#> sati_0 8.233 22.128 2.145 10.658 0.275 11.088 0.000 0.121 0.000
#> ada_0 48.748 33.400 54.236 32.454 -0.164 -5.271 0.000 0.081 0.000
#> recov_0 0.246 0.431 0.240 0.427 0.015 0.479 0.632 0.006 1.000
#> tss_0 2.277 3.525 1.924 3.115 0.100 3.365 0.001 0.043 0.050
#> dss9_0 2.750 2.604 2.638 2.492 0.043 1.390 0.165 0.023 0.666
#>
#> $ks.max.ATT
#> tx.mn tx.sd ct.mn ct.sd std.eff.sz stat p ks ks.pval
#> eps7p_0 0.256 0.196 0.249 0.193 0.034 1.023 0.306 0.025 0.643
#> sfs8p_0 11.253 13.134 10.935 12.623 0.024 0.725 0.469 0.020 0.881
#> sati_0 8.233 22.128 6.399 19.437 0.083 1.840 0.066 0.036 0.207
#> ada_0 48.748 33.400 50.324 33.268 -0.047 -1.365 0.172 0.028 0.489
#> recov_0 0.246 0.431 0.234 0.424 0.027 0.810 0.418 0.012 1.000
#> tss_0 2.277 3.525 2.226 3.440 0.015 0.399 0.690 0.016 0.976
#> dss9_0 2.750 2.604 2.790 2.582 -0.016 -0.446 0.655 0.014 0.992
```

Notice that the only difference here is we use `estimand = ATT`

to estimate the propensity score weights using TWANG. We observe reasonable balance; all of our standardized effect size differences are less than 0.1. As mentioned in the ATE example, refer to Ridgeway et al. for further information on balance diagnostics.

The following code snippet displays how a user would insert a `ps`

object produced by `twang`

into `outcome_model`

and specify the appropriate stopping method to extract the propensity score weights as opposed to utilizing the `weights`

argument. Other than this, the only difference in the call to `outcome_model`

from the ATE example is specifying the correct `estimand`

.

```
= outcome_model(ps_object = ps.twang_att,
results_att stop.method = "ks.max",
data = sud,
weights = NULL,
treatment = "treat",
outcome = "eps7p_3_std",
model_covariates = c("eps7p_0", "sfs8p_0",
"sati_0", "ada_0",
"recov_0", "tss_0",
"mhtrt_0", "dss9_0"),
estimand = "ATT")
summary(results_att$mod_results)
```

```
#>
#> Call:
#> svyglm(formula = formula, design = design_u)
#>
#> Survey design:
#> survey::svydesign(ids = ~1, weights = ~w_orig, data = data)
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.3492268 0.0669257 5.218 1.90e-07 ***
#> treat 0.0641723 0.0304310 2.109 0.0350 *
#> eps7p_0 1.6430733 0.1242182 13.227 < 2e-16 ***
#> sfs8p_0 0.0026741 0.0021297 1.256 0.2093
#> sati_0 0.0020878 0.0012827 1.628 0.1037
#> ada_0 0.0002505 0.0007861 0.319 0.7500
#> recov_0 -0.0788224 0.0336288 -2.344 0.0191 *
#> tss_0 0.0344397 0.0074878 4.599 4.37e-06 ***
#> mhtrt_0 0.2468881 0.0370127 6.670 2.90e-11 ***
#> dss9_0 0.0481264 0.0081534 5.903 3.88e-09 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for gaussian family taken to be 0.7229591)
#>
#> Number of Fisher Scoring iterations: 2
```

Under the ATT setting, our treatment effect is 0.064 with a p-value of 0.035.

We now want to understand how sensitive our result is to unobserved confounders. Below we will run the `ov_sim`

function; we notice that in our call to `ov_sim`

the parameters remain the same as in our ATE example. The `estimand`

parameter is stored in `results_att`

and carried forward throughout the remainder of the analysis.

```
= ov_sim(model_results=results_att,
ovtool_results_twang_att plot_covariates=c("eps7p_0", "sfs8p_0",
"sati_0", "ada_0",
"recov_0", "tss_0",
"dss9_0"),
es_grid = NULL,
rho_grid = seq(0, 0.40, by = 0.05),
n_reps = 50,
progress = TRUE)
```

Occasionally the specified number of repetitions (default is `n_reps = 50`

) is not sufficient. Evidence that `n_reps`

is not sufficient are contours that don’t appear smooth. The number of repetitions needed can vary based on sample size, the distribution of the outcome, and the specified estimand. Once we observe our graphical results, if the contours do not look smooth, we recommend calling the `add_reps`

function and specifying `more_reps`

to the number of additional simulations of the unobserved confounder you desire.

`plot.ov(ovtool_results_twang_att, print_graphic = "1", col = "bw")`

The contours shown in Figure 4 are relatively smooth but for sake of example we show how a user would specify 10 additional repetitions. The user can specify `more_reps`

to any number they choose in the following function call, `add_reps()`

.

```
# If you want to add repetitions, run the following line.
= add_reps(OVtool_results = ovtool_results_twang_att,
ovtool_results_twang_att model_results = results_att,
more_reps = 10)
```

Once the number of repetitions is finalized, the user can visualize the results through the same three function calls as described in the ATE section above.

To review, the graphics visualize (1) treatment effect contours, (2) treatment effect contours in black and p-value contours in red, and (3) treatment effect and p-value contours with covariate labels added, respectively.

`plot.ov(ovtool_results_twang_att, print_graphic = "1", col = "bw")`

`plot.ov(ovtool_results_twang_att, print_graphic = "2", col = "color")`

```
#> [1] "NOTE: Covariates with absolute correlation with
outcome greater than 0.3: eps7p_0 (Actual: 0.509), tss_0
(Actual: 0.423), dss9_0 (Actual: 0.420)"
```

`plot.ov(ovtool_results_twang_att, print_graphic = "3", col = "color")`

```
#> [1] "NOTE: Covariates with absolute correlation with
outcome greater than 0.3: eps7p_0 (Actual: 0.509), tss_0
(Actual: 0.423), dss9_0 (Actual: 0.420)"
```

Finally, we will run the `summary.ov`

function to receive the tool’s recommendations.

`summary.ov(object = ovtool_results_twang_att, model_results = results_att)`

```
#> [1] "Recommendation for reporting the sensitivity
analyses"
#> [1] "The sign of the estimated effect is expected to
remain consistent when simulated unobserved confounders
have the same strength of associations with the treatment
indicator and outcome that are seen in 6 of the 7 observed
confounders. In the most extreme observed case in which the
sign changes, the estimated treatment effect shifts from
0.064 to -0.04. The sign of the estimate would not be
expected to be preserved for unobserved confounders that
have the same strength of association with the treatment
indicator and outcome as eps7p_0."
#> [1] "Statistical significance at the 0.05 level is
expected to be robust to unobserved confounders with
strengths of associations with the treatment indicator and
outcome that are seen in 1 of the 7 observed confounders.
In the most extreme observed case, the p-value would be
expected to increase from 0.035 to 0.583. Significance at
the 0.05 level would not be expected to be preserved for
unobserved confounders that have the same strength of
association with the treatment indicator and outcome as
eps7p_0, sfs8p_0, sati_0, ada_0, tss_0, dss9_0."
```

The text produced states that the sign of the estimated effect is expected to remain consistent when simulated unobserved confounders have the same strength of associations with the treatment indicator and outcome that are seen in six of the seven observed confounders. On the other hand, statistical significance at the 0.05 alpha level is robust to unobserved confounders with strengths of associations with treatment indicator and outcome that are seen in only one of our observed covariates.

In observational studies, understanding the sensitivity of reported results to unobserved confounders is a key step when interpreting the strength of evidence. This package aims to provide user-friendly tools to assess the sensitivity of research findings to omitted variables when estimating causal effects using PS weighting.

This tutorial and the development of the `OVtool`

R package were supported by funding from grant 1R01DA034065 from the National Institute on Drug Abuse. The overarching goal of the grant is to develop statistical methods and tools that will provide addiction health services researchers and others with the tools and training they need to study the effectiveness of treatments using observational data. The work is an extension of the Toolkit for Weighting and Analysis of Nonequivalent Groups, or TWANG, which contains a set of functions to support causal modeling of observational data through the estimation and evaluation of propensity score weights. The TWANG package was first developed in 2004 by RAND researchers for the R statistical computing language and environment and has since been expanded to include tools for SAS, Stata, and Shiny. For more information about TWANG and other causal tools being developed, see www.rand.org/statistics/twang.

RAND Social and Economic Well-Being is a division of the RAND Corporation that seeks to actively improve the health and social and economic well-being of populations and communities throughout the world. This research was conducted in the Social and Behavioral Policy Program within RAND Social and Economic Well-Being. The program focuses on such topics as risk factors and prevention programs, social safety net programs and other social supports, poverty, aging, disability, child and youth health and well-being, and quality of life, as well as other policy concerns that are influenced by social and behavioral actions and systems that affect well-being. For more information, email sbp@rand.org.

We would like to acknowledge the Center for Substance Abuse Treatment (CSAT), Substance Abuse and Mental Health Services Administration (SAMHSA). The authors thank these agencies, grantees, and their participants for agreeing to share their data to support creation of the synthetic dataset used in this analysis. This tutorial uses a synthetic dataset of youth receiving two unidentified treatments from the GAIN; running on the true dataset will produce different results.

We also would like to thank our beta testers: Irineo Cabreros, Marika Booth, Andreas Markoulidakis, Brooke Hunter, and Yajna Chakraborti. Their comments, feedback, and suggestions were invaluable throughout the development of this tool.

Burgette, L., Griffin, B. A., Pane, J. D., and McCaffrey, D. M. (2021). Got a feeling something is missing? Assessing Sensitivity to Omitted Variables in Substance Use Treatment. arXiv.

Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers.

Dennis, M. L., Titus, J. C., White, M. K., Unsicker, J. I., & Hodgkins, D. (2003). Global appraisal of individual needs: Administration guide for the GAIN and related measures. Bloomington, IL: Chestnut Health Systems.

Griffin, B.A., Ayer, L., Pane, J., Vegetabile, B.G., Burgette, L.F., McCaffrey, D.M., Coffman, D.L., Cefalu, M., Funk, R., Godley, M. Expanding outcomes when considering the relative effectiveness of two evidence-based outpatient treatment programs for adolescents. 2020. Journal of Substance Abuse Treatment, 118, 108075. PMCID: PMC519172

Lumley, T (2020). “survey: analysis of complex survey samples.” R package version 4.0.

McCaffrey, D. F., Ridgeway, G., and Morral, A. R. (2004). Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychological methods 9, 403.

Nowok B, Raab GM, Dibben C (2016). “synthpop: Bespoke Creation of Synthetic Data in R.” Journal of Statistical Software, 74(11), 1–26. doi: 10.18637/jss.v074.i11.

Ridgeway, G., McCaffrey, D., Morral, A., Cefalu, M., Burgette, L., and Griffin, B. A. (2021). Toolkit for Weighting and Analysis of Nonequivalent Groups: A tutorial for the twang package. Santa Monica, CA: RAND Corporation.