--- title: "user_sample_1" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{user_sample_1} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ``` r library(ubair) ``` ## Load data for one station and run counterfactual Investigate effect of 9 Euro ticket for station Köln. ### Prerequisites Installation for user as described in README ### Steps 1. adapt the data_dir (in windows e.g. //de-inf-008/\$/Eigene Dateien/ubair-master/data/) ``` r # set data_dir where the data is stored data_dir <- "../../Daten/user_sample_data/" # This might take a few seconds for large files data <- load_uba_data_from_dir(data_dir = data_dir) ``` 2. Set sample variables. ``` r target <- "NO2" station <- "DENW006" station_name <- "Köln" meteo_variables <- c("TMP", "RFE", "WIG", "WIR", "LDR") # dates for 9 Euro effect application_start <- lubridate::ymd("20220301") # = start reference time date_effect_start <- lubridate::ymd_hm("20220601 00:00") application_end <- lubridate::ymd("20220831") # = end effect time buffer <- 0 # number of data points to be ignored before effect trend <- "linear" model_type <- "rf" window_size <- 14 # days of data to calculate the mean in prediction results ``` 3. create a params.yaml 4. either - by copying the default to your working dir, update the new params.yaml file and load it or - load the default and adapt the params programmatically ``` r params <- load_params() # adapt params programatically params$target <- target params$meteo_variables <- meteo_variables ``` 5. Prepare data of station for training. See function documentation for further details ``` r env_data <- clean_data(data, station = station) dt_prepared <- prepare_data_for_modelling(env_data, params) dt_prepared <- dt_prepared[complete.cases(dt_prepared)] split_data <- split_data_counterfactual( dt_prepared, application_start = application_start, application_end = application_end ) ``` 6. Run counterfactual scenario (training and prediction) ``` r res <- run_counterfactual(split_data, params, detrending_function = trend, model_type = model_type, alpha = 0.9, log_transform = FALSE ) predictions <- res$prediction ``` 7. Plot counterfactual run and optionally save to data_dir ``` r counterfactual_plot <- plot_counterfactual(predictions, params, window_size = window_size, date_effect_start, buffer = buffer ) counterfactual_plot ``` ![plot of chunk plot_counter_1](figure/plot_counter_1-1.png) 8. Evaluate model and effect ``` r round(calc_performance_metrics(predictions, date_effect_start, buffer = buffer ), 2) #> RMSE MSE MAE MAPE Bias R2 #> 8.69 75.45 5.83 0.36 -2.16 0.67 #> Coverage lower Coverage upper Coverage Correlation MFB FGE #> 1.00 0.98 0.98 0.84 -0.10 0.41 round(calc_summary_statistics(predictions, date_effect_start, buffer = buffer ), 2) #> true prediction #> min 0.63 -1.02 #> max 92.58 57.05 #> var 228.50 111.89 #> mean 17.76 15.61 #> 5-percentile 3.24 2.31 #> 25-percentile 6.94 7.16 #> median/50-percentile 11.91 13.02 #> 75-percentile 24.65 22.92 #> 95-percentile 48.59 34.90 paste("effect size:", estimate_effect_size(predictions, date_effect_start, buffer = buffer, verbose = FALSE )) #> [1] "effect size: -0.393826299501436" "effect size: -0.0288" ```