--- title: "Introduction to Double DiD" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{double} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` Here we show how to deal with confounding events, see [Tsai (2024)](https://arxiv.org/abs/2409.05184) for the full treatment. We first simulate some data with confounding events. ```{r setup} library(fastdid) library(ggplot2) library(data.table) simdt <- sim_did(1e+04, 5, cov = "cont", hetero = "all", balanced = TRUE, seed = 1, second_cohort = TRUE, second_het = "no") #comfounding event dt <- simdt$dt #dataset #ground truth att att <- simdt$att |> merge(dt[,.(w = .N),by = "G"], by = "G") att[, event_time := time-G] att <- att[event == 1,.(att = weighted.mean(attgt, w)), by = "event_time"] ``` Using the default estimator, the estimates (black line) is biased from the ground truth (red line). ```{r, naive} naive_result <-fastdid(data = dt, timevar = "time", cohortvar = "G", unitvar = "unit", outcomevar = "y", result_type = "dynamic") plot_did_dynamics(naive_result) + geom_line(aes(y = att, x = event_time), data = att, color = "red") + theme_bw() ``` Diagnostics can be obtained by using `time >= G2` as outcome. As we set the effect of the confounding event at 10 constantly, we can see that the bias of the default estimator is roughly 10 times the diagnostics. ```{r, diag} dt[, D2 := time >= G2] diag <- fastdid(data = dt, timevar = "time", cohortvar = "G", unitvar = "unit", outcomevar = "D2", result_type = "dynamic") plot_did_dynamics(diag) + theme_bw() ``` Using the double did estimator with `cohortvar2`, the estimator recovers the ground truth. ```{r, double} double_result <-fastdid(data = dt, timevar = "time", cohortvar = "G", unitvar = "unit", outcomevar = "y", result_type = "dynamic", cohortvar2 = "G2", event_specific = TRUE) plot_did_dynamics(double_result) + geom_line(aes(y = att, x = event_time), data = att, color = "red") + theme_bw() ``` Double DiD also allow for two additional aggregation scheme: group-group-time ("group_group_time") and dynamic-staggered ("dynamic_stagger", event time by event stagger, G1-G2). ```{r, agg} double_result_ds <-fastdid(data = dt, timevar = "time", cohortvar = "G", unitvar = "unit", outcomevar = "y", result_type = "dynamic_stagger", cohortvar2 = "G2", event_specific = TRUE) double_result_ggt <-fastdid(data = dt, timevar = "time", cohortvar = "G", unitvar = "unit", outcomevar = "y", result_type = "group_group_time", cohortvar2 = "G2", event_specific = TRUE) ```