--- title: "Coding Simulations" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{coding-simulations} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(ordinalsimr) ``` # Coding Your Own Simulations This guide will provide a rough overview of how to code your own simulations using the components of this package should you find the Shiny application too limiting for your own purposes. Key information on functions: * `run_simulations()` will take simulation input parameters and return a named list of tibbles. Each tibble represents the results of a simulation at a given sample size, with names e.g. "sample_size_32". * `bind_rows()` is used to combine the tibbles into a single tibble. * `calculate_power_t2error()` and `calculate_t1_error()` can receive the p-value data frames for performing T1 Error, Power, and T2 Error calculations with confidence intervals. See function documentation for additional arguments. ## Power and Type II Error ```{r, warning=FALSE} sim_results <- run_simulations( sample_size = 80, sample_prob = c(0.5, 0.5), prob0 = c(0.1, 0.2, 0.3, 0.4), prob1 = c(0.6, 0.2, 0.1, 0.1), niter = 20 ) formatted_results <- dplyr::bind_rows(sim_results) names(formatted_results) formatted_results %>% dplyr::select( Wilcoxon, Fisher, `Chi Squared (No Correction)`, `Chi Squared (Correction)`, `Prop. Odds`, `Coin Indep. Test`, sample_size ) %>% calculate_power_t2error(alpha = 0.05, power_confidence_int = 95) ``` ## Type I Error To find the Type I error of a distribution, the code from before is largely unchanged except for the fact that the probability vectors set in `run_simulations` must now be equivalent and the `calculate_t1_error()` function is now applied. ```{r, warning=FALSE} sim_results <- run_simulations( sample_size = 30:35, sample_prob = c(0.5, 0.5), prob0 = c(.4, .3, .3), prob1 = c(.8, .1, .1), # note the matching probabilities between groups niter = 50 ) formatted_results <- dplyr::bind_rows(sim_results) names(formatted_results) formatted_results %>% dplyr::select( Wilcoxon, Fisher, `Chi Squared (No Correction)`, `Chi Squared (Correction)`, `Prop. Odds`, `Coin Indep. Test`, sample_size ) %>% calculate_t1_error(alpha = 0.05, t1_error_confidence_int = 95) ``` ## Mapping Over Many Sample Sizes The current version of the Shiny application can only accept a range of sample sizes. However, you may find it useful to iterate over a discrete set of sample sizes and calculate the Power and Type II error for each rather than a range. This can save time and computation power when you are only interested in a few specific sample sizes. Depending on how big the number of iterations per sample sizes (`niter`) and the actual number of sample sizes being checked, it may only be practical to do this in a parallelized manner with e.g. {furrr} or {parallel}. In any case, an example of such code is included below: ```{r, warning=FALSE} # Create a vector of sample sizes sample_sizes <- c(30, 50, 100) # Map over the sample sizes lapply(sample_sizes, function(x) { run_simulations( sample_size = x, sample_prob = c(0.5, 0.5), prob0 = c(0.1, 0.2, 0.3, 0.4), prob1 = c(0.6, 0.2, 0.1, 0.1), niter = 100 ) %>% dplyr::bind_rows() %>% dplyr::select( Wilcoxon, Fisher, `Chi Squared (No Correction)`, `Chi Squared (Correction)`, `Prop. Odds`, `Coin Indep. Test`, sample_size ) %>% calculate_power_t2error() }) ``` And if you're more of a {purrr} person: ``` r sample_sizes %>% purrr::map( ~run_simulations( sample_size = .x, sample_prob = c(0.5, 0.5), prob0 = c(0.1, 0.2, 0.3, 0.4), prob1 = c(0.6, 0.2, 0.1, 0.1), niter = 100 ) %>% bind_rows() %>% select(Wilcoxon, Fisher, `Chi Squared\n(No Correction)`, `Chi Squared\n(Correction)`, `Prop. Odds`, `Coin Indep. Test`, sample_size) %>% calculate_power_t2error() ) ``` ## Adjust Multiple Parameters It is perhaps more likely that analysts will want to iterate simulations over a variety of different parameters at once. The code below provides a structure for creating a combination grid based on the 5 input parameters that can be altered; this example can easily be altered to include desired parameters by replacing/removing/expanding the listed parameters. ### Set Parameters Note that the `prob0_list` and `prob1_list` must always be of the same length **and** the corresponding sub-elements of the list must also be of the same length. Put in terms of the application, there must be a Group 2 if there is a Group 1, and the vector representing the number of possible outcomes must be the same length for these 2 groups. If performing simulations on one distribution to evaluate Type I error, it is only necessary to form one probability list. This object can be recycled for the probabilities of both groups. ```{r, warning=FALSE} # Choose sample sizes sample_size <- c(50, 100) # Set sample distributions as a proportion c(group1, group2) sample_prob <- list(c(0.5, 0.5), c(0.75, 0.25)) # Trial 1 has matching probabilities between the 2 groups. Trial 2 has non-matching probabilities prob0_list <- list(trial1_group1 = c(0.1, 0.2, 0.3, 0.4), trial2_group1 = c(0.1, 0.2, 0.3, 0.4)) prob1_list <- list(trial1_group2 = c(0.1, 0.2, 0.3, 0.4), trial2_group2 = c(0.2, 0.3, 0.3, 0.2)) # Number of iterations niter <- c(20, 100) ``` ### Create Simulation Grid Use `tidyr::expand_grid()` rather than `base:expand.grid()` because the former creates a tibble by default, and this supports the nested tibble structure which I'm relying on. (This can, of course, be approached in other ways if desired.) ```{r, warning=FALSE} # Use tidyr::expand_grid as it creates a tibble, supporting the nested tibble structure info_table <- tidyr::expand_grid( sample_size, sample_prob, prob0_list, prob1_list, niter ) info_table ``` ### Run Simulation The example below is complete in running the simulations and calculating the Power and Type II error. However, the same code can be applied to either calculate the Type I error or use the p-values for other purposes. ```{r, warning=FALSE} # Calculate either Power/T2 error or T1 error depending on your specific needs many_sims <- mapply( ordinalsimr::run_simulations, sample_size = info_table$sample_size, sample_prob = info_table$sample_prob, prob0 = info_table$prob0_list, prob1 = info_table$prob1_list, niter = info_table$niter ) length(many_sims) many_sims[1] ``` And again for the {purrr} folks: ``` r info_table %>% purrr::pmap( ~ run_simulations( sample_size = ..1, sample_prob = ..2, prob0 = ..3, prob1 = ..4, niter = ..5 ) %>% bind_rows() %>% magrittr::extract2("p_values") %>% calculate_power_t2error(), .progress = TRUE ) ``` Note that even with relatively small sample sizes and a number of iterations 1-3 magnitudes less than normally specified for simulation studies, processing 32 different simulations took ~20-30 seconds.