# [R] Sample size Determination to Compare Three Independent Proportions

Marc Schwartz m@rc_@chw@rtz @end|ng |rom me@com
Tue Aug 10 15:28:52 CEST 2021

```Hi,

A search would suggest that there may not be an R function/package that
provides power/sample size calculations for the specific scenarios that
you are describing. There may be something that I am missing, and there
is also other dedicated software such as PASS
(https://www.ncss.com/software/pass/) which is not free, but provides a
large library of possibly relevant functions and support.

That being said, you can run Monte Carlo simulations in R to achieve the
results you want, while providing yourself with options relative to
study design, intended tests, and adjustments for multiple comparisons
as apropos. Many prefer this approach, since it gives you specific
control over this process.

Taking the simple case, where you are going to run a 3 x 2 chi-square as
your primary endpoint, and want to power for that, here is a possible
function, with the same sample size in each group:

ThreeGroups <- function(n, p1, p2, p3, R = 10000, power = 0.8) {

MCSim <- function(n, p1, p2, p3) {
## Create a binary distribution for each group
G1 <- rbinom(n, 1, p1)
G2 <- rbinom(n, 1, p2)
G3 <- rbinom(n, 1, p3)

## Create a 3 x 2 matrix containing the 3 group counts
MAT <- cbind(table(G1), table(G2), table(G3))

## Perform a chi-square and just return the p value
chisq.test(MAT)\$p.value
}

## Replicate the above R times, and get
## a distribution of p values
MC <- replicate(R, MCSim(n, p1, p2, p3))

## Get the p value at the desired "power" quantile
quantile(MC, power)
}

Essentially, the above internal MCSim() function generates 3 random
samples of size 'n' from the binomial distribution, at the 3 proportions
desired. For each run, it will perform a chi-square test of the 3 x 2
matrix of counts, returning the p value for each run. The main function
will then return the p value at the quantile (power) within the
generated distribution of p values.

You can look at the help pages for the various functions that I use
above, to get a sense for how they work.

You increase the sample size ('n') until you get a p value returned <=
0.05, if that is your desired alpha level.

You also want 'R', the number of replications within each run, to be
large enough so that the returned p value quantile is relatively stable.
Values for 'R', once you get "close to" the desired p value should be on
the order of 1,000,000 or higher. Stay with lower values for 'R' until
you get in the ballpark of your target, since larger values take much
longer to run.

Thus, using your example proportions of 0.25, 0.25, and 0.35:

## 250 per group, 750 total - Not enough
> ThreeGroups(250, 0.25, 0.25, 0.35, R = 10000)
80%
0.08884723

## 350 per group, 1050 total - Too high
> ThreeGroups(350, 0.25, 0.25, 0.35, R = 10000)
80%
0.0270829

## 300 per group, 900 total - Close!
> ThreeGroups(300, 0.25, 0.25, 0.35, R = 10000)
80%
0.04818842

So, keep tweaking the sample size until you get a returned p value at
your target alpha level, with a large enough 'R', so that you get
consistent sample sizes for multiple runs.

If I run 300 per group again, with 10,000 replicates:

> ThreeGroups(300, 0.25, 0.25, 0.35, R = 10000)
80%
0.05033933

the returned p value is slightly higher. So, again, increase R to
improve the stability of the returned p value and run it multiple times
to be comfortable that the p value change is less than an acceptable
threshold.

Now, the tricky part is to decide if the 3 x 2 is your primary endpoint,
and want to power only for that, or, if you also want to power for the
other two-group comparisons, possibly having to account for p value
adjustments for the multiple comparisons, resulting in the need to power
for a lower alpha level for those tests. In that scenario, you would end
up taking the largest sample size that you identify across the various
hypotheses, recognizing that while you are powering for one hypothesis,
you may be overpowering for others.

That is something that you need to decide, and perhaps consider
consulting with other local statistical expertise, as may be apropos, in
the prospective study design, possibly influenced by other
relevant/similar research in your domain.

You can easily modify the above function for the two-group scenario as
well, and I will leave that to you.

Regards,

Marc

AbouEl-Makarim Aboueissa wrote on 8/10/21 6:34 AM:
> Hi Marc:
>
> First, thank you very much for your help in this matter.
>
>
> Will perform an initial omnibus test of all three groups (e.g. 3 x 2
> chi-square), possibly followed by
> all possible 2 x 2 pairwise comparisons (e.g. 1 versus 2, 1 versus 3,
> 2 versus 3),
>
> We can assume _either_ the desired sample size in each group is the same
> _or_ proportional to the population size.
>
>   We can set p=0.25 and set p1=p2=p3=p so that the H0 is true.
>
> We can assume that the expected proportion of "Yes" values in each group
> is 0.25
>
> For the alternative hypotheses, for example,  we can set  p1 = .25,
> p2=.25, p3=.35
>
>
> Again thank you very much in advance.
>
> abou
>
> ______________________
>
> *AbouEl-Makarim Aboueissa, PhD
> *
> *
> *
> *Professor, Statistics and Data Science*
> *Department of Mathematics and Statistics
> *
> *University of Southern Maine*
>
>
>
> On Mon, Aug 9, 2021 at 10:53 AM Marc Schwartz <marc_schwartz using me.com
> <mailto:marc_schwartz using me.com>> wrote:
>
>     Hi,
>
>     You are going to need to provide more information than what you have
>     below and I may be mis-interpreting what you have provided.
>
>     Presuming you are designing a prospective, three-group, randomized
>     allocation study, there is typically an a priori specification of the
>     ratios of the sample sizes for each group such as 1:1:1, indicating
>     that
>     the desired sample size in each group is the same.
>
>     You would also need to specify the expected proportions of "Yes" values
>     in each group.
>
>     Further, you need to specify how you are going to compare the
>     proportions in each group. Are you going to perform an initial omnibus
>     test of all three groups (e.g. 3 x 2 chi-square), possibly followed by
>     all possible 2 x 2 pairwise comparisons (e.g. 1 versus 2, 1 versus 3, 2
>     versus 3), or are you just going to compare 2 versus 1, and 3 versus 1,
>     where 1 is a control group?
>
>     Depending upon your testing plan, you may also need to account for p
>     value adjustments for multiple comparisons, in which case, you also
>     need
>     to specify what adjustment method you plan to use, to know what the
>     target alpha level will be.
>
>     On the other hand, if you already have the data collected, thus have
>     fixed sample sizes available per your wording below, simply go ahead
>     and
>     perform your planned analyses, as the notion of "power" is largely an a
>     priori consideration, which reflects the probability of finding a
>     "statistically significant" result at a given alpha level, given that
>     your a priori assumptions are valid.
>
>     Regards,
>
>     Marc Schwartz
>
>
>     AbouEl-Makarim Aboueissa wrote on 8/9/21 9:41 AM:
>      > Dear All: good morning
>      >
>      > *Re:* Sample Size Determination to Compare Three Independent
>     Proportions
>      >
>      > *Situation:*
>      >
>      > Three Binary variables (Yes, No)
>      >
>      > Three independent populations with fixed sizes (*say:* N1 = 1500,
>     N2 = 900,
>      > N3 = 1350).
>      >
>      > Power = 0.80
>      >
>      > How to choose the sample sizes to compare the three proportions
>     of “Yes”
>      > among the three variables.
>      >
>      > If you know a reference to this topic, it will be very helpful too.
>      >
>      > with many thanks in advance
>      >
>      > abou
>      > ______________________
>      >
>      >
>      > *AbouEl-Makarim Aboueissa, PhD*
>      >
>      > *Professor, Statistics and Data Science*
>      > *Graduate Coordinator*
>      >
>      > *Department of Mathematics and Statistics*
>      > *University of Southern Maine*
>      >
>

```