make_cate

library(OPL)

#make_cate# Predicting conditional average treatment effect (CATE) on a new policy based on the training over an old policy

##Description## make_cate is a function generating conditional average treatment effect (CATE) for both a training dataset and a testing (or new) dataset related to a binary (treated vs. untreated) policy program. It provides the main input for running opl_tb (optimal policy learning of a threshold-based policy), opl_tb_c (optimal policy learning of a threshold-based policy at specific thresholds), opl_lc (optimal policy learning of a linear-combination policy), opl_lc_c (optimal policy learning of a linear-combination policy at specific parameters), opl_dt (optimal policy learning of a decision-tree policy), opl_dt_c (optimal policy learning of a decision-tree policy at specific thresholds and selection variables). Based on Kitagawa and Tetenov (2018), the main econometrics supported by these commands can be found in Cerulli (2022).

##Function Syntax##

make_cate(model, train_data, test_data, w, x, y, family = gaussian(), ntree = 100, mtry = 2)

##Arguments:## - model: A string indicating the model to use. Valid options are “glm” and “rf”. - train_data: The training dataset used for estimating the treatment effect on the old policy. - test_data: The test dataset used for estimating the treatment effect on the new policy. - w: The treatment variable. - x: Independent variables for the model. - y: The outcome variable. - family: The family type for the model (e.g., “binomial”, “gaussian”). - ntree: Number of trees for the Random Forest model (default is 100).

##Return Value## An object containing the estimated causal treatment effect results, including: - Average Treatment Effect (ATE) - Average Treatment Effect on Treated (ATET) - Average Treatment Effect on Non-Treated (ATENT)

##Example Usage##

set.seed(42)
train_data <- data.frame(
  y = rnorm(100),    # Outcome
  x1 = rnorm(100),   # Covariate
  x2 = rnorm(100),
  w = sample(0:1, 100, replace = TRUE)  # Trattamento
)

test_data <- data.frame(
  y = rnorm(100),    # Outcome
  x1 = rnorm(100),   # Covariate
  x2 = rnorm(100),
  w = sample(0:1, 100, replace = TRUE)  # Trattamento
)

  x <- c("x1", "x2")  # le covariate
  y <- "y"            # la variabile dipendente
  w <- "w"            # la variabile di trattamento
  family <- "gaussian"  # Famiglia per glm
  ntree <- 100          # Numero di alberi per random forest
  mtry <- 2             # Numero di variabili da considerare in ogni split
  
result <- make_cate(model = "glm", train_data = train_data, test_data = test_data, w = w, x = x, y = y)
#>  --------------------------------------------------
#>  - Treatment-effects estimation: OLD POLICY       -
#>  --------------------------------------------------
#>  Number of obs = 100 
#>  Estimator = regression adjustment 
#>  Outcome model = linear 
#>  DIM = 0.0705516 
#>  ATE = 0.0490622 
#>  ATET = 0.0889665 
#>  ATENT = 0.0399044 
#>  --------------------------------------------------
#>  - Treatment-effects estimation: NEW POLICY       -
#>  --------------------------------------------------
#>  Number of obs = 100 
#>  Estimator = regression adjustment 
#>  Outcome model = linear 
#>  DIM = 0.3071428 
#>  ATE = 0.0196866 
#>  ATET = 0.0887607 
#>  ATENT = 0.0690741

##Detailed Steps:## - Train and Test Data: The function separates the data into treated and untreated groups for both the training and test datasets. - Model Estimation: The function estimates the treatment effect using either a GLM or Random Forest model. It then calculates the predicted outcome for both the treated and untreated groups. - Causal Effect Calculation: The CATE is calculated as the difference in predicted outcomes between the treated and untreated groups. - Output: The function returns the estimated treatment effects (ATE, ATET, ATENT) and the treatment effects for both the training and test datasets.

##Results## The following results are output for both the old (training) and new (test) policy:

- Treatment-effects estimation: OLD POLICY -
Number of obs = [n] Estimator = regression adjustment Outcome model = linear DIM = [DIM_value] ATE = [ATE_value] ATET = [ATET_value] ATENT = [ATENT_value]

##Conclusion## The make_cate function is a powerful tool for estimating the causal treatment effect of a policy using either GLM or Random Forest models. It provides insights into the treatment effects both for the old and new policies, making it a useful method for causal inference in policy analysis.