Using the freq_test Function

Brad Cannell

Created: 2017-11-10
Updated: 2022-04-03

library(freqtables)
library(dplyr)
data(mtcars)

This vignette is about testing differences in frequencies/proportions, which is a common task in my data analysis workflow. Additionally, it’s important to me that these tools work well in a dplyr pipeline.

Because we want different behavior for different classes of inputs, freq_test is an S3 generic function with multiple methods.

The name freq_test is easy to reason about, sounds similar to freq_table, which must be the input to freq_test, and is general enough to contain chi-square, fisher, etc.

The flowchart below illustrates the scenarios, and related hypothesis tests, discussed in this vignette.

Table of Contents


Univariate analysis

Two categories

In some cases, we collect data from a single sample of people and measure the proportion of those people who can be classified (or categorized) according to some characteristic of interest. For example, suppose we collected data from a group of students and wanted to know what proportion of those students could be classified as males and females.

In the following code, we generate a small sample of students (n = 20). The student’s gender is drawn at random from a binomial distribution where being male or female are equally likely. Said another way, there is an equal number of male and female students on our hypothetical campus, and we have sampled a random subset of 20 of them without bias. In this context, “without bias” means that every student on our campus was equally likely to turn up in our sample.

set.seed(123)
students <- tibble(
  male = rbinom(20, 1, 0.5)
)

Next, we can calculate the frequency with which each gender appears in our sample of 20 students.

students %>% 
  freq_table(male)
#> # A tibble: 2 × 9
#>   var   cat       n n_total percent    se t_crit   lcl   ucl
#>   <chr> <chr> <int>   <int>   <dbl> <dbl>  <dbl> <dbl> <dbl>
#> 1 male  0         9      20      45  11.4   2.09  23.8  68.2
#> 2 male  1        11      20      55  11.4   2.09  31.8  76.2

The frequencies returned to us by freq_table tell us that our sample includes 9 females (which is 45% of all students in the sample) and 11 males (which is 55% of all students sampled). But, what about the precision of those estimates?

If we are only interested in the proportion of students in our sample that are male and female, then we’re done. We have our answer. Typically, though, our interest extends beyond our little sample. What we really want to know is if the proportion of males and females differ among all students on our hypothetical campus (our superpopulation).

I like the following quote from Daniel Kaplan that discusses sampling variability and its relationship to precision of our estimate:

Less obvious to those starting out in statistics is the idea that a quantity such as the mean or median itself has limited precision. This isn’t a matter of the calculation itself. If the calculation has been done correctly, the resulting quantity is exactly right. Instead the limited precision arises from the sampling process that was used to collect the data. The exact mean or median relates to the particular sample you have collected. Had you collected a different sample, it’s likely that you would have gotten a somewhat different value for the quantities you calculate, the so-called sampling variability. One purpose of statistical inference is to characterize sampling variability, that is, to provide a quantitative meaning for “somewhat different value.”
Kaplan, Daniel. Statistical Modeling: A Fresh Approach (Project MOSAIC Books) (Kindle Locations 844-850). Project MOSAIC Books. Kindle Edition.

Further:

Even if all the individuals in a population were included in a study, the study subjects are viewed as a sample of the potential biologic experience of an even broader conceptual population.
Rothman, Kenneth J. Modern Epidemiology (p. 149). Lippincot (Wolters Kluwer Health). Kindle Edition.

In the case of our estimate above, we are characterizing the precision of our estimates in the form of 95% confidence intervals. Here, “precision” is a rough estimate of our uncertainty around the parameter we are interested in (i.e., the population proportion) based in the information we have available to us (Rothman, 2008). Perhaps it should be called an uncertainty interval or precision interval, rather than a confidence interval. The width of this measure of uncertainty is made up of a combination of sampling variability, the arbitrarily selected value for alpha (i.e., 0.05), the sample size, bias, and the underlying correctness of the model being applied to the data (Rothman, 2008; Kaplan, 2017).

So, for the above sample, we simply state that the percentage of female students was 45.00% (95% confidence interval 23.76% to 68.23%) and the percentage of male students was 55.00% (31.77% to 76.24%).

Test for a difference in proportions

In the analysis above, we found that 45% of our sample was female and 55% of our sample was male. But again, we want to make reasonable estimates of whether or not the proportion of males and females differ among all students on our hypothetical campus. One common way to do so is with a null value hypothesis test. Specifically, in this example, we want to test the null hypothesis that the percentage of males and females among all students on our hypothetical campus are equal.

We can do so using methods based on the sample proportion of cases. They assume the normal approximation to the binomial distribution is valid. This assumption is reasonable when np0q0 ≥ 5 (Rosner, 2015, page 249).

students %>% 
  freq_table(male) %>% 
  
  # Test for equal proportions
  summarise(
    p_bar = n[1] / n_total[1],
    p0 = 0.5,
    n = n_total[1],
    z = abs((p_bar - p0)) / sqrt(p0 * (1 - p0) / n),
    p = 2 * pnorm(z, lower.tail=FALSE)
  )
#> # A tibble: 1 × 5
#>   p_bar    p0     n     z     p
#>   <dbl> <dbl> <int> <dbl> <dbl>
#> 1  0.45   0.5    20 0.447 0.655

We can also get this value from R’s built-in prop.test and R’s built-in chisq.test.

prop.test(9, 20, p = 0.5, correct = FALSE)
#> 
#>  1-sample proportions test without continuity correction
#> 
#> data:  9 out of 20, null probability 0.5
#> X-squared = 0.2, df = 1, p-value = 0.6547
#> alternative hypothesis: true p is not equal to 0.5
#> 95 percent confidence interval:
#>  0.2581979 0.6579147
#> sample estimates:
#>    p 
#> 0.45
chisq.test(c(9, 11), p = c(0.5, 0.5), correct = FALSE)
#> 
#>  Chi-squared test for given probabilities
#> 
#> data:  c(9, 11)
#> X-squared = 0.2, df = 1, p-value = 0.6547

Finally, I’ve built this test into freqtables::freq_test, which is designed to work in a dplyr pipeline.

students %>% 
  freq_table(male) %>% 
  freq_test()
#> # A tibble: 2 × 14
#>   var   cat       n n_total percent    se t_crit   lcl   ucl n_expected
#>   <chr> <chr> <int>   <int>   <dbl> <dbl>  <dbl> <dbl> <dbl>      <dbl>
#> 1 male  0         9      20      45  11.4   2.09  23.8  68.2         10
#> 2 male  1        11      20      55  11.4   2.09  31.8  76.2         10
#> # … with 4 more variables: chi2_contrib <dbl>, chi2_pearson <dbl>, df <dbl>,
#> #   p_chi2_pearson <dbl>

Interpretation

We can interpret this result in the following way: The probability of getting data that conflict with the tested hypothesis as much as or more than the observed data conflict with the tested hypothesis is 0.65, if the tested hypothesis is correct and all other assumptions used in computing the p-value are correct. These data are compatible with the null hypothesis of equal proportions of males and females on our hypothetical campus.

Assumptions

  1. The study subjects may be treated as a simple random sample from a population of similar subjects (Daniel, 2013, page 258).

  2. The sampling distribution of p_hat is approximately normally distributed in accordance with the central limit theorem (Daniel, 2013, page 258).

Comparison to other statistical software

Here, and below, will will use the mtcars data for ease of comparison.

mtcars %>% 
  freq_table(am) %>% 
  freq_test() %>% 
  select(1:6, p_chi2_pearson)
#>   var cat  n n_total percent       se p_chi2_pearson
#> 1  am   0 19      32  59.375 8.820997      0.2888444
#> 2  am   1 13      32  40.625 8.820997      0.2888444

 

These results match the results obtained from Stata and SAS (see below).

 


Stata


SAS

References:

Daniel, W. W., & Cross, C. L. (2013). Biostatistics: A foundation for analysis in the health sciences (Tenth). Hoboken, NJ: Wiley.

Kaplan, D. T. (2017). Statistical modeling: A fresh approach (Second). Project MOSAIC Books.

Rosner, B. (2015). Fundamentals of biostatistics (Eighth). MA: Cengage Learning.

Rothman, K. J., Greenland, S., & Lash, T. L. (2008). Modern epidemiology (Third). Philadelphia, PA: Lippincott Williams & Wilkins.

 

TOC

 


Univariate analysis

More than two categories

In this example we are still testing for equality of proportions. However, now we are testing for equality across more than two categories.

Notice that the syntax doesn’t change. In fact, there isn’t even an additional method needed for this scenario.

mtcars %>% 
  freq_table(cyl) %>% 
  freq_test()
#>   var cat  n n_total percent       se   t_crit      lcl      ucl n_expected
#> 1 cyl   4 11      32  34.375 8.530513 2.039513 19.49961 53.11130   10.66667
#> 2 cyl   6  7      32  21.875 7.424859 2.039513 10.34883 40.44691   10.66667
#> 3 cyl   8 14      32  43.750 8.909831 2.039513 27.09672 61.94211   10.66667
#>   chi2_contrib chi2_pearson df p_chi2_pearson
#> 1   0.01041667       2.3125  2       0.314664
#> 2   1.26041667       2.3125  2       0.314664
#> 3   1.04166667       2.3125  2       0.314664

 

This is the same result we get from SAS. I was unable to find an easy way to do this test in Stata.

 

SAS

TOC

 


Bivariate Analysis

Two categories

Now, suppose we are interested in the relationship between two categorical variables. More precisely, we can estimate the degree of compatibility between our observed values, and the values we would expect to observe if the two characteristics we are analyzing were independent (i.e., knowing the value of one characteristic tells you nothing about the value of the other characteristic).

For example, earlier we found that our data were compatible with the hypothesis that there are equal proportions of females and males on our hypothetical campus – our superpopulation. Now, we suspect that females and males in our superpopulation differ with respect to their binge drinking behavior.

To investigate this hypothesis further, we will collect data from a simple random sample of students, measure their gender, and measure whether they binge drank in the last 30 days or not (yes/no). Then we will use the chi-square test of independence to draw conclusions about the degree of compatibility between our data, and the data we would expect to observe if binge drinking and gender were independent of one another.

set.seed(456)
students <- tibble(
  male  = rbinom(20, 1, 0.5),
  binge = if_else(male == 0, 
                  rbinom(20, 1, 0.10), # 10% of females binge drink 
                  rbinom(20, 1, 0.30)) # 30% of males binge drink
)
students %>% 
  freq_table(male, binge) %>% 
  freq_test()
#> One or more expected cell counts are <= 5. Therefore, Fisher's Exact Test was used.
#> # A tibble: 4 × 26
#>   row_var row_cat col_var col_cat     n n_row n_total percent_total se_total
#>   <chr>   <chr>   <chr>   <chr>   <int> <int>   <int>         <dbl>    <dbl>
#> 1 male    0       binge   0           8    11      20            40    11.2 
#> 2 male    0       binge   1           3    11      20            15     8.19
#> 3 male    1       binge   0           7     9      20            35    10.9 
#> 4 male    1       binge   1           2     9      20            10     6.88
#> # … with 17 more variables: t_crit_total <dbl>, lcl_total <dbl>,
#> #   ucl_total <dbl>, percent_row <dbl>, se_row <dbl>, t_crit_row <dbl>,
#> #   lcl_row <dbl>, ucl_row <dbl>, n_col <int>, n_expected <dbl>,
#> #   chi2_contrib <dbl>, chi2_pearson <dbl>, r <int>, c <int>, df <dbl>,
#> #   p_chi2_pearson <dbl>, p_fisher <dbl>

There are a couple things worth mentioning here. First, 27% of the females in our sample engaged in binge drinking, compared to 22% of males – even though we know that the underlying probability in our superpopulation was 10% for females and 30% for males. This discrepancy highlights the potential for sampling variability, and small sample estimates, to be misleading.

Additionally, because some expected cell counts were less than 5, we should estimate our p-value using Fisher’s exact method.

students %>% 
  freq_table(male, binge) %>% 
  freq_test()
#> One or more expected cell counts are <= 5. Therefore, Fisher's Exact Test was used.
#> # A tibble: 4 × 26
#>   row_var row_cat col_var col_cat     n n_row n_total percent_total se_total
#>   <chr>   <chr>   <chr>   <chr>   <int> <int>   <int>         <dbl>    <dbl>
#> 1 male    0       binge   0           8    11      20            40    11.2 
#> 2 male    0       binge   1           3    11      20            15     8.19
#> 3 male    1       binge   0           7     9      20            35    10.9 
#> 4 male    1       binge   1           2     9      20            10     6.88
#> # … with 17 more variables: t_crit_total <dbl>, lcl_total <dbl>,
#> #   ucl_total <dbl>, percent_row <dbl>, se_row <dbl>, t_crit_row <dbl>,
#> #   lcl_row <dbl>, ucl_row <dbl>, n_col <int>, n_expected <dbl>,
#> #   chi2_contrib <dbl>, chi2_pearson <dbl>, r <int>, c <int>, df <dbl>,
#> #   p_chi2_pearson <dbl>, p_fisher <dbl>

Interpretation

The large p-value (p = 1.0) calculated using Fisher’s exact method indicates that the data are highly consistent with the null hypothesis of statistical independence between gender and binge drinking at our hypothetical campus, assuming that the null hypothesis is true and the study is free of bias. However, wide range of the 95% confidence intervals makes the precision of the point estimates doubtful. Another study using larger sample size is needed to get a more precise estimate of the relationship between gender and binge drinking.

Assumptions

  1. Pretty much all of the tests we have encountered in this book have made an assumption about the independence of data and the chi-square test is no exception. For the chi-square test to be meaningful it is imperative that each person, item or entity contributes to only one cell of the contingency table. Therefore, you cannot use a chi-square test on a repeated-measures design (e.g., if we had trained some cats with food to see if they would dance and then trained the same cats with affection to see if they would dance, we couldn’t analyse the resulting data with Pearson’s chi-square test).

Field, Andy; Miles, Jeremy; Field, Zoe. Discovering Statistics Using R (p. 818). SAGE Publications. Kindle Edition.

  1. The expected frequencies should be greater than 5. Although it is acceptable in larger contingency tables to have up to 20% of expected frequencies below 5, the result is a loss of statistical power (so the test may fail to detect a genuine effect). Even in larger contingency tables no expected frequencies should be below 1. Howell (2006) gives a nice explanation of why violating this assumption creates problems. If you find yourself in this situation consider using Fisher’s exact test.

Field, Andy; Miles, Jeremy; Field, Zoe. Discovering Statistics Using R (p. 818). SAGE Publications. Kindle Edition.

Comparison to other statistical software

mtcars %>% 
  freq_table(am, vs) %>% 
  freq_test() %>% 
  select(1:8, p_chi2_pearson)
#> # A tibble: 4 × 9
#>   row_var row_cat col_var col_cat     n n_row n_total percent_total
#>   <chr>   <chr>   <chr>   <chr>   <int> <int>   <int>         <dbl>
#> 1 am      0       vs      0          12    19      32          37.5
#> 2 am      0       vs      1           7    19      32          21.9
#> 3 am      1       vs      0           6    13      32          18.8
#> 4 am      1       vs      1           7    13      32          21.9
#> # … with 1 more variable: p_chi2_pearson <dbl>

 

Stata


SAS

TOC

 


Bivariate Analysis

More than two categories

mtcars %>% 
  freq_table(am, cyl) %>% 
  freq_test()
#> One or more expected cell counts are <= 5. Therefore, Fisher's Exact Test was used.
#> # A tibble: 6 × 26
#>   row_var row_cat col_var col_cat     n n_row n_total percent_total se_total
#>   <chr>   <chr>   <chr>   <chr>   <int> <int>   <int>         <dbl>    <dbl>
#> 1 am      0       cyl     4           3    19      32          9.38     5.24
#> 2 am      0       cyl     6           4    19      32         12.5      5.94
#> 3 am      0       cyl     8          12    19      32         37.5      8.70
#> 4 am      1       cyl     4           8    13      32         25        7.78
#> 5 am      1       cyl     6           3    13      32          9.38     5.24
#> 6 am      1       cyl     8           2    13      32          6.25     4.35
#> # … with 17 more variables: t_crit_total <dbl>, lcl_total <dbl>,
#> #   ucl_total <dbl>, percent_row <dbl>, se_row <dbl>, t_crit_row <dbl>,
#> #   lcl_row <dbl>, ucl_row <dbl>, n_col <int>, n_expected <dbl>,
#> #   chi2_contrib <dbl>, chi2_pearson <dbl>, r <int>, c <int>, df <dbl>,
#> #   p_chi2_pearson <dbl>, p_fisher <dbl>

 

There is one problem with the chi-square test, which is that the sampling distribution of the test statistic has an approximate chi-square distribution. The larger the sample is, the better this approximation becomes, and in large samples the approximation is good enough to not worry about the fact that it is an approximation. However, in small samples the approximation is not good enough, making significance tests of the chi-square distribution inaccurate.

Field, Andy; Miles, Jeremy; Field, Zoe. Discovering Statistics Using R (p. 816). SAGE Publications. Kindle Edition.

mtcars %>% 
  freq_table(am, cyl) %>% 
  freq_test()
#> One or more expected cell counts are <= 5. Therefore, Fisher's Exact Test was used.
#> # A tibble: 6 × 26
#>   row_var row_cat col_var col_cat     n n_row n_total percent_total se_total
#>   <chr>   <chr>   <chr>   <chr>   <int> <int>   <int>         <dbl>    <dbl>
#> 1 am      0       cyl     4           3    19      32          9.38     5.24
#> 2 am      0       cyl     6           4    19      32         12.5      5.94
#> 3 am      0       cyl     8          12    19      32         37.5      8.70
#> 4 am      1       cyl     4           8    13      32         25        7.78
#> 5 am      1       cyl     6           3    13      32          9.38     5.24
#> 6 am      1       cyl     8           2    13      32          6.25     4.35
#> # … with 17 more variables: t_crit_total <dbl>, lcl_total <dbl>,
#> #   ucl_total <dbl>, percent_row <dbl>, se_row <dbl>, t_crit_row <dbl>,
#> #   lcl_row <dbl>, ucl_row <dbl>, n_col <int>, n_expected <dbl>,
#> #   chi2_contrib <dbl>, chi2_pearson <dbl>, r <int>, c <int>, df <dbl>,
#> #   p_chi2_pearson <dbl>, p_fisher <dbl>

Interpret

Compare to Stata and SAS

 

Comparison to Stata and SAS


Stata


SAS

TOC

 


Calculate Pearson’s Chi-square

Pearson’s chi-square is given by:

\[\chi^2=\sum \frac{(observed_{ij} - expected_{ij})^2}{expected_{ij}}\]

where,

\[expected_{ij} = \frac{row \ total * column \ total}{total}\]

For example:

mtcars %>% 
  freq_table(am, vs) %>% 
  freq_test() %>% 
  select(1:4, n_col, n_total, n_expected:chi2_pearson)
#> # A tibble: 4 × 9
#>   row_var row_cat col_var col_cat n_col n_total n_expected chi2_contrib
#>   <chr>   <chr>   <chr>   <chr>   <int>   <int>      <dbl>        <dbl>
#> 1 am      0       vs      0          18      32      10.7         0.161
#> 2 am      0       vs      1          14      32       8.31        0.207
#> 3 am      1       vs      0          18      32       7.31        0.236
#> 4 am      1       vs      1          14      32       5.69        0.303
#> # … with 1 more variable: chi2_pearson <dbl>

After obtaining the Pearson chi-squared value (0.9068826 above), we compare it to the critical value for the chi-square distribution with degrees of freedom = (rows - 1)(columns - 1) = 1. If the observed value is bigger than the critical value, we conclude that there is a statistically significant relationship.

For example:

mtcars %>% 
  freq_table(am, vs) %>% 
  freq_test() %>% 
  select(1:3, chi2_pearson:p_chi2_pearson)
#> # A tibble: 4 × 8
#>   row_var row_cat col_var chi2_pearson     r     c    df p_chi2_pearson
#>   <chr>   <chr>   <chr>          <dbl> <int> <int> <dbl>          <dbl>
#> 1 am      0       vs             0.907     2     2     1          0.341
#> 2 am      0       vs             0.907     2     2     1          0.341
#> 3 am      1       vs             0.907     2     2     1          0.341
#> 4 am      1       vs             0.907     2     2     1          0.341

And this is the same result we get from R’s built-in chi-square test.

chisq.test(mtcars$am, mtcars$vs, correct = FALSE)
#> 
#>  Pearson's Chi-squared test
#> 
#> data:  mtcars$am and mtcars$vs
#> X-squared = 0.90688, df = 1, p-value = 0.3409

However, our result has the following advantages (in my opinion):

Why not use Yate’s continuity correction?

When you have a 2 × 2 contingency table (i.e., two categorical variables each with two categories) then Pearson’s chi-square tends to produce significance values that are too small (in other words, it tends to make a Type I error). Therefore, Yates suggested a correction to the Pearson formula (usually referred to as Yates’s continuity correction). The basic idea is that when you calculate the deviation from the model (the observedij – modelij in equation (18.1)) you subtract 0.5 from the absolute value of this deviation before you square it. In plain English, this means you calculate the deviation, ignore whether it is positive or negative, subtract 0.5 from the value and then square it.

The key thing to note is that it lowers the value of the chi-square statistic and, therefore, makes it less significant. Although this seems like a nice solution to the problem there is a fair bit of evidence that this overcorrects and produces chi-square values that are too small! Howell (2006) provides an excellent discussion of the problem with Yates’s correction for continuity, if you’re interested; all I will say is that, although it’s worth knowing about, it’s probably best ignored.

Field, Andy; Miles, Jeremy; Field, Zoe. Discovering Statistics Using R (p. 817). SAGE Publications. Kindle Edition.

Why not use Pearson’s chi-square with small sample sizes?

There is one problem with the chi-square test, which is that the sampling distribution of the test statistic has an approximate chi-square distribution. The larger the sample is, the better this approximation becomes, and in large samples the approximation is good enough to not worry about the fact that it is an approximation. However, in small samples the approximation is not good enough, making significance tests of the chi-square distribution inaccurate. This is why you often read that to use the chi-square test the expected frequencies in each cell must be greater than 5.

Field, Andy; Miles, Jeremy; Field, Zoe. Discovering Statistics Using R (p. 816). SAGE Publications. Kindle Edition.

Assumptions

  1. Independence of observations. Cannot use chi-square test on repeated or nested measures.

  2. The expected frequencies should be greater than 5. Expected frequencies below 5 results in a loss of statistical power, and may fail to detect genuine effects (Type II error).

TOC

 


Calculate Fisher’s Exact Test

Fisher’s exact test (Fisher, 1922) is not so much a test as a way of computing the exact probability of a statistic. It was designed originally to overcome the problem that with small samples the sampling distribution of the chi-square statistic deviates substantially from a chi-square distribution. It should be used with small samples.

Field, Andy; Miles, Jeremy; Field, Zoe. Discovering Statistics Using R (p. 918). SAGE Publications. Kindle Edition.

Fisher’s exact test is computationally intensive. In order to get a feel for how it works, I’ve replicated a simple example from Wolfram MathWorld below.

In this example, X is a journal, say either Mathematics Magazine or Science, and Y is the number of articles on the topics of mathematics and biology appearing in a given issue of one of these journals. If Mathematics Magazine has five articles on math and one on biology, and Science has none on math and four on biology, then the relevant matrix would be:

m_observed <- matrix(c(5, 0, 1, 4), nrow = 2, byrow = TRUE)
m_observed
#>      [,1] [,2]
#> [1,]    5    0
#> [2,]    1    4

Then calculate the conditional probability of getting the actual matrix given the particular row and column sums using the multivariate generalization of the hypergeometric probability function, which can be calculated as:

p <- function(m) {
  
  # Calculate the marginal totals
  r1  <- sum(m[1, ])
  r2  <- sum(m[2, ])
  c1  <- sum(m[, 1])
  c2  <- sum(m[, 2])
  tot <- sum(m)
  
  # Calculate the conditional probability of getting the actual matrix given 
  # the particular row and column sums
  r1_fac  <- factorial(r1)
  r2_fac  <- factorial(r2)
  c1_fac  <- factorial(c1)
  c2_fac  <- factorial(c2)
  tot_fac <- factorial(tot)
  
  m_fac      <- factorial(m) # factorial of each cell of the matrix (i.e., 5!, 0!, 1!, 4!)
  prod_m_fac <- prod(m_fac)  # Multiply all of those values together
  
  numerator   <- r1_fac * r2_fac * c1_fac * c2_fac
  denominator <- tot_fac * (prod_m_fac)
  p <- numerator / denominator
  
  # Return p
  p
}

So, in the example above, the p value for our observed data (p_cutoff) equals:

p_cutoff <- p(m_observed)
p_cutoff
#> [1] 0.02380952

Next, we need to calculate the p value for all other combinations of values that could have existed given the marginal totals in our observed matrix “m”.

For example,

4 1
2 3

would give us the same marginal totals (i.e., 5, 5, 6, 4)

Below I calculate the p for each possible combination, and save them into a vector for use in the next step. In this case, the possible combinations are given on the Wolfram Mathworks website. I’m still searching for an efficient algorithm that will find these combinations.

# Create a empty vector to hold the p values created in the loop below
p_values <- vector(mode = "numeric")

# Create a list of all possible combinations, given our margins
combinations <- list(
  c(4, 1 ,2, 3),
  c(3, 2, 3, 2),
  c(2, 3, 4, 1),
  c(1, 4, 5, 0)
)

# Calculate the p value for each combination and save it
for (i in seq_along(combinations)) {
  
  # Turn into a matrix
  m <- matrix(combinations[[i]], nrow = 2, byrow = TRUE)
  
  # Caclulate p
  p_val <- p(m)
  
  # Save p to vector
  p_values <- c(p_values, p_val)
}

These p-values should all sum to 1

sum(p_values, p_cutoff)
#> [1] 1

Next, the P-value of the test can be simply computed by the sum of all P-values which are <= p_cutoff (including p_cutoff itself).

df <- tibble(
  p_value  = p_values,
  p_cutoff = p_cutoff
) %>% 
  mutate(
    less_or_equal = p_value <= p_cutoff
  ) %>% 
  print()
#> # A tibble: 4 × 3
#>   p_value p_cutoff less_or_equal
#>     <dbl>    <dbl> <lgl>        
#> 1  0.238    0.0238 FALSE        
#> 2  0.476    0.0238 FALSE        
#> 3  0.238    0.0238 FALSE        
#> 4  0.0238   0.0238 TRUE
df %>% 
  filter(less_or_equal) %>% 
  summarise(
    `p-value` = sum(p_value, p_cutoff)
  )
#> # A tibble: 1 × 1
#>   `p-value`
#>       <dbl>
#> 1    0.0476

And for comparison, let’s check our results against R’s built-in function for Fisher’s Exact Test.

fisher.test(m_observed)
#> 
#>  Fisher's Exact Test for Count Data
#> 
#> data:  m_observed
#> p-value = 0.04762
#> alternative hypothesis: true odds ratio is not equal to 1
#> 95 percent confidence interval:
#>  1.024822      Inf
#> sample estimates:
#> odds ratio 
#>        Inf

TOC