# [R] Sample size Determination to Compare Three Independent Proportions

AbouEl-Makarim Aboueissa @boue|m@k@r|m1962 @end|ng |rom gm@||@com
Wed Aug 11 11:21:01 CEST 2021

```Hi Marc:

Thank you for your help in this matter.

With thanks
Abou

On Tue, Aug 10, 2021, 9:28 AM Marc Schwartz <marc_schwartz using me.com> wrote:

> Hi,
>
> A search would suggest that there may not be an R function/package that
> provides power/sample size calculations for the specific scenarios that
> you are describing. There may be something that I am missing, and there
> is also other dedicated software such as PASS
> (https://www.ncss.com/software/pass/) which is not free, but provides a
> large library of possibly relevant functions and support.
>
> That being said, you can run Monte Carlo simulations in R to achieve the
> results you want, while providing yourself with options relative to
> study design, intended tests, and adjustments for multiple comparisons
> as apropos. Many prefer this approach, since it gives you specific
> control over this process.
>
> Taking the simple case, where you are going to run a 3 x 2 chi-square as
> your primary endpoint, and want to power for that, here is a possible
> function, with the same sample size in each group:
>
> ThreeGroups <- function(n, p1, p2, p3, R = 10000, power = 0.8) {
>
>    MCSim <- function(n, p1, p2, p3) {
>      ## Create a binary distribution for each group
>      G1 <- rbinom(n, 1, p1)
>      G2 <- rbinom(n, 1, p2)
>      G3 <- rbinom(n, 1, p3)
>
>      ## Create a 3 x 2 matrix containing the 3 group counts
>      MAT <- cbind(table(G1), table(G2), table(G3))
>
>      ## Perform a chi-square and just return the p value
>      chisq.test(MAT)\$p.value
>    }
>
>    ## Replicate the above R times, and get
>    ## a distribution of p values
>    MC <- replicate(R, MCSim(n, p1, p2, p3))
>
>    ## Get the p value at the desired "power" quantile
>    quantile(MC, power)
> }
>
> Essentially, the above internal MCSim() function generates 3 random
> samples of size 'n' from the binomial distribution, at the 3 proportions
> desired. For each run, it will perform a chi-square test of the 3 x 2
> matrix of counts, returning the p value for each run. The main function
> will then return the p value at the quantile (power) within the
> generated distribution of p values.
>
> You can look at the help pages for the various functions that I use
> above, to get a sense for how they work.
>
> You increase the sample size ('n') until you get a p value returned <=
> 0.05, if that is your desired alpha level.
>
> You also want 'R', the number of replications within each run, to be
> large enough so that the returned p value quantile is relatively stable.
> Values for 'R', once you get "close to" the desired p value should be on
> the order of 1,000,000 or higher. Stay with lower values for 'R' until
> you get in the ballpark of your target, since larger values take much
> longer to run.
>
> Thus, using your example proportions of 0.25, 0.25, and 0.35:
>
> ## 250 per group, 750 total - Not enough
>  > ThreeGroups(250, 0.25, 0.25, 0.35, R = 10000)
>         80%
> 0.08884723
>
> ## 350 per group, 1050 total - Too high
>  > ThreeGroups(350, 0.25, 0.25, 0.35, R = 10000)
>        80%
> 0.0270829
>
> ## 300 per group, 900 total - Close!
>  > ThreeGroups(300, 0.25, 0.25, 0.35, R = 10000)
>         80%
> 0.04818842
>
>
> So, keep tweaking the sample size until you get a returned p value at
> your target alpha level, with a large enough 'R', so that you get
> consistent sample sizes for multiple runs.
>
> If I run 300 per group again, with 10,000 replicates:
>
>  > ThreeGroups(300, 0.25, 0.25, 0.35, R = 10000)
>         80%
> 0.05033933
>
> the returned p value is slightly higher. So, again, increase R to
> improve the stability of the returned p value and run it multiple times
> to be comfortable that the p value change is less than an acceptable
> threshold.
>
> Now, the tricky part is to decide if the 3 x 2 is your primary endpoint,
> and want to power only for that, or, if you also want to power for the
> other two-group comparisons, possibly having to account for p value
> adjustments for the multiple comparisons, resulting in the need to power
> for a lower alpha level for those tests. In that scenario, you would end
> up taking the largest sample size that you identify across the various
> hypotheses, recognizing that while you are powering for one hypothesis,
> you may be overpowering for others.
>
> That is something that you need to decide, and perhaps consider
> consulting with other local statistical expertise, as may be apropos, in
> the prospective study design, possibly influenced by other
> relevant/similar research in your domain.
>
> You can easily modify the above function for the two-group scenario as
> well, and I will leave that to you.
>
> Regards,
>
> Marc
>
>
> AbouEl-Makarim Aboueissa wrote on 8/10/21 6:34 AM:
> > Hi Marc:
> >
> > First, thank you very much for your help in this matter.
> >
> >
> > Will perform an initial omnibus test of all three groups (e.g. 3 x 2
> > chi-square), possibly followed by
> > all possible 2 x 2 pairwise comparisons (e.g. 1 versus 2, 1 versus 3,
> > 2 versus 3),
> >
> > We can assume _either_ the desired sample size in each group is the same
> > _or_ proportional to the population size.
> >
> >   We can set p=0.25 and set p1=p2=p3=p so that the H0 is true.
> >
> > We can assume that the expected proportion of "Yes" values in each group
> > is 0.25
> >
> > For the alternative hypotheses, for example,  we can set  p1 = .25,
> > p2=.25, p3=.35
> >
> >
> > Again thank you very much in advance.
> >
> > abou
> >
> > ______________________
> >
> > *AbouEl-Makarim Aboueissa, PhD
> > *
> > *
> > *
> > *Professor, Statistics and Data Science*
> > *Department of Mathematics and Statistics
> > *
> > *University of Southern Maine*
> >
> >
> >
> > On Mon, Aug 9, 2021 at 10:53 AM Marc Schwartz <marc_schwartz using me.com
> > <mailto:marc_schwartz using me.com>> wrote:
> >
> >     Hi,
> >
> >     You are going to need to provide more information than what you have
> >     below and I may be mis-interpreting what you have provided.
> >
> >     Presuming you are designing a prospective, three-group, randomized
> >     allocation study, there is typically an a priori specification of the
> >     ratios of the sample sizes for each group such as 1:1:1, indicating
> >     that
> >     the desired sample size in each group is the same.
> >
> >     You would also need to specify the expected proportions of "Yes"
> values
> >     in each group.
> >
> >     Further, you need to specify how you are going to compare the
> >     proportions in each group. Are you going to perform an initial
> omnibus
> >     test of all three groups (e.g. 3 x 2 chi-square), possibly followed
> by
> >     all possible 2 x 2 pairwise comparisons (e.g. 1 versus 2, 1 versus
> 3, 2
> >     versus 3), or are you just going to compare 2 versus 1, and 3 versus
> 1,
> >     where 1 is a control group?
> >
> >     Depending upon your testing plan, you may also need to account for p
> >     value adjustments for multiple comparisons, in which case, you also
> >     need
> >     to specify what adjustment method you plan to use, to know what the
> >     target alpha level will be.
> >
> >     On the other hand, if you already have the data collected, thus have
> >     fixed sample sizes available per your wording below, simply go ahead
> >     and
> >     perform your planned analyses, as the notion of "power" is largely
> an a
> >     priori consideration, which reflects the probability of finding a
> >     "statistically significant" result at a given alpha level, given that
> >     your a priori assumptions are valid.
> >
> >     Regards,
> >
> >     Marc Schwartz
> >
> >
> >     AbouEl-Makarim Aboueissa wrote on 8/9/21 9:41 AM:
> >      > Dear All: good morning
> >      >
> >      > *Re:* Sample Size Determination to Compare Three Independent
> >     Proportions
> >      >
> >      > *Situation:*
> >      >
> >      > Three Binary variables (Yes, No)
> >      >
> >      > Three independent populations with fixed sizes (*say:* N1 = 1500,
> >     N2 = 900,
> >      > N3 = 1350).
> >      >
> >      > Power = 0.80
> >      >
> >      > How to choose the sample sizes to compare the three proportions
> >     of “Yes”
> >      > among the three variables.
> >      >
> >      > If you know a reference to this topic, it will be very helpful
> too.
> >      >
> >      > with many thanks in advance
> >      >
> >      > abou
> >      > ______________________
> >      >
> >      >
> >      > *AbouEl-Makarim Aboueissa, PhD*
> >      >
> >      > *Professor, Statistics and Data Science*