Comparison of Simulated Distribution to Theoretical Distribution or Empirical Data

Headrick and Kowalchuk (2007) outlined a general method for comparing a simulated distribution \(\Large Y\) to a given theoretical distribution \(\Large Y^*\). Note that these could easily be modified for comparison to an empirical vector of data:

Example

Use these steps to compare a simulated exponential(mean = 2) variable to the theoretical exponential(mean = 2) density. (Note that the printr package is invoked to display the tables.)

Step 1: Obtain the standardized cumulants

In R, the exponential parameter is rate <- 1/mean.

library("SimMultiCorrData")
library("printr")
stcums <- calc_theory(Dist = "Exponential", params = 0.5)

Step 2: Simulate the variable

Note that calc_theory returns the standard deviation, not the variance. The simulation functions require variance as the input.

H_exp <- nonnormvar1("Polynomial", means = stcums[1], vars = stcums[2]^2, 
                     skews = stcums[3], skurts = stcums[4], 
                     fifths = stcums[5], sixths = stcums[6], n = 10000, 
                     seed = 1234)

## Constants: Distribution  1  
## 
## Constants calculation time: 0 minutes 
## Total Simulation time: 0 minutes

Look at the power method constants.

as.matrix(H_exp$constants, nrow = 1, ncol = 6, byrow = TRUE)

c0	c1	c2	c3	c4	c5
-0.3077396	0.8005605	0.318764	0.0335001	-0.0036748	0.0001587

Look at a summary of the target distribution.

as.matrix(round(H_exp$summary_targetcont[, c("Distribution", "mean", "sd", 
                                             "skew", "skurtosis", "fifth", 
                                             "sixth")], 5), nrow = 1, ncol = 7,
          byrow = TRUE)

	Distribution	mean	sd	skew	skurtosis	fifth	sixth
mean	1	2	2	2	6	24	120

Compare to a summary of the simulated distribution.

as.matrix(round(H_exp$summary_continuous[, c("Distribution", "mean", "sd", 
                                             "skew", "skurtosis", "fifth", 
                                             "sixth")], 5), nrow = 1, ncol = 7,
          byrow = TRUE)

	Distribution	mean	sd	skew	skurtosis	fifth	sixth
X1	1	1.99987	2.0024	2.03382	6.18067	23.74145	100.3358

Step 3: Determine if the constants generate a valid power method pdf

H_exp$valid.pdf

## [1] "TRUE"

Step 4: Select a critical value

Let \(\Large \alpha = 0.05\).

y_star <- qexp(1 - 0.05, rate = 0.5) # note that rate = 1/mean
y_star

## [1] 5.991465

Step 5: Solve for \(\Large z'\)

Since the exponential(2) distribution has a mean and standard deviation equal to 2, solve \(\Large 2 * p(z') + 2 - y_star = 0\) for \(\Large z'\). Here, \(\Large p(z') = c0 + c1 * z' + c2 * z'^2 + c3 * z'^3 + c4 * z'^4 + c5 * z'^5\).

f_exp <- function(z, c, y) {
  return(2 * (c[1] + c[2] * z + c[3] * z^2 + c[4] * z^3 + c[5] * z^4 + 
                c[6] * z^5) + 2 - y)
}

z_prime <- uniroot(f_exp, interval = c(-1e06, 1e06), 
                   c = as.numeric(H_exp$constants), y = y_star)$root
z_prime

## [1] 1.644926

Step 6: Calculate \(\Large \Phi(z')\)

1 - pnorm(z_prime)

## [1] 0.04999249

This is approximately equal to the \(\Large \alpha\) value of 0.05, indicating the method provides a good approximation to the actual distribution.

Step 7: Plot graphs

plot_sim_pdf_theory(sim_y = H_exp$continuous_variable[, 1], 
                    Dist = "Exponential", params = 0.5)

We can also plot the empirical cdf and show the cumulative probability up to y_star.

plot_sim_cdf(sim_y = H_exp$continuous_variable[, 1], calc_cprob = TRUE, 
             delta = y_star)

Calculate descriptive statistics.

as.matrix(t(stats_pdf(c = H_exp$constants[1, ], method = "Polynomial", 
                    alpha = 0.025, mu = stcums[1], sigma = stcums[2])))

trimmed_mean	median	mode	max_height
1.858381	1.384521	0.104872	1.094213