[R] Sample size Determination to Compare Three Independent Proportions
AbouEl-Makarim Aboueissa
@boue|m@k@r|m1962 @end|ng |rom gm@||@com
Wed Aug 11 11:21:01 CEST 2021
Hi Marc:
Thank you for your help in this matter.
With thanks
Abou
On Tue, Aug 10, 2021, 9:28 AM Marc Schwartz <marc_schwartz using me.com> wrote:
> Hi,
>
> A search would suggest that there may not be an R function/package that
> provides power/sample size calculations for the specific scenarios that
> you are describing. There may be something that I am missing, and there
> is also other dedicated software such as PASS
> (https://www.ncss.com/software/pass/) which is not free, but provides a
> large library of possibly relevant functions and support.
>
> That being said, you can run Monte Carlo simulations in R to achieve the
> results you want, while providing yourself with options relative to
> study design, intended tests, and adjustments for multiple comparisons
> as apropos. Many prefer this approach, since it gives you specific
> control over this process.
>
> Taking the simple case, where you are going to run a 3 x 2 chi-square as
> your primary endpoint, and want to power for that, here is a possible
> function, with the same sample size in each group:
>
> ThreeGroups <- function(n, p1, p2, p3, R = 10000, power = 0.8) {
>
> MCSim <- function(n, p1, p2, p3) {
> ## Create a binary distribution for each group
> G1 <- rbinom(n, 1, p1)
> G2 <- rbinom(n, 1, p2)
> G3 <- rbinom(n, 1, p3)
>
> ## Create a 3 x 2 matrix containing the 3 group counts
> MAT <- cbind(table(G1), table(G2), table(G3))
>
> ## Perform a chi-square and just return the p value
> chisq.test(MAT)$p.value
> }
>
> ## Replicate the above R times, and get
> ## a distribution of p values
> MC <- replicate(R, MCSim(n, p1, p2, p3))
>
> ## Get the p value at the desired "power" quantile
> quantile(MC, power)
> }
>
> Essentially, the above internal MCSim() function generates 3 random
> samples of size 'n' from the binomial distribution, at the 3 proportions
> desired. For each run, it will perform a chi-square test of the 3 x 2
> matrix of counts, returning the p value for each run. The main function
> will then return the p value at the quantile (power) within the
> generated distribution of p values.
>
> You can look at the help pages for the various functions that I use
> above, to get a sense for how they work.
>
> You increase the sample size ('n') until you get a p value returned <=
> 0.05, if that is your desired alpha level.
>
> You also want 'R', the number of replications within each run, to be
> large enough so that the returned p value quantile is relatively stable.
> Values for 'R', once you get "close to" the desired p value should be on
> the order of 1,000,000 or higher. Stay with lower values for 'R' until
> you get in the ballpark of your target, since larger values take much
> longer to run.
>
> Thus, using your example proportions of 0.25, 0.25, and 0.35:
>
> ## 250 per group, 750 total - Not enough
> > ThreeGroups(250, 0.25, 0.25, 0.35, R = 10000)
> 80%
> 0.08884723
>
> ## 350 per group, 1050 total - Too high
> > ThreeGroups(350, 0.25, 0.25, 0.35, R = 10000)
> 80%
> 0.0270829
>
> ## 300 per group, 900 total - Close!
> > ThreeGroups(300, 0.25, 0.25, 0.35, R = 10000)
> 80%
> 0.04818842
>
>
> So, keep tweaking the sample size until you get a returned p value at
> your target alpha level, with a large enough 'R', so that you get
> consistent sample sizes for multiple runs.
>
> If I run 300 per group again, with 10,000 replicates:
>
> > ThreeGroups(300, 0.25, 0.25, 0.35, R = 10000)
> 80%
> 0.05033933
>
> the returned p value is slightly higher. So, again, increase R to
> improve the stability of the returned p value and run it multiple times
> to be comfortable that the p value change is less than an acceptable
> threshold.
>
> Now, the tricky part is to decide if the 3 x 2 is your primary endpoint,
> and want to power only for that, or, if you also want to power for the
> other two-group comparisons, possibly having to account for p value
> adjustments for the multiple comparisons, resulting in the need to power
> for a lower alpha level for those tests. In that scenario, you would end
> up taking the largest sample size that you identify across the various
> hypotheses, recognizing that while you are powering for one hypothesis,
> you may be overpowering for others.
>
> That is something that you need to decide, and perhaps consider
> consulting with other local statistical expertise, as may be apropos, in
> the prospective study design, possibly influenced by other
> relevant/similar research in your domain.
>
> You can easily modify the above function for the two-group scenario as
> well, and I will leave that to you.
>
> Regards,
>
> Marc
>
>
> AbouEl-Makarim Aboueissa wrote on 8/10/21 6:34 AM:
> > Hi Marc:
> >
> > First, thank you very much for your help in this matter.
> >
> >
> > Will perform an initial omnibus test of all three groups (e.g. 3 x 2
> > chi-square), possibly followed by
> > all possible 2 x 2 pairwise comparisons (e.g. 1 versus 2, 1 versus 3,
> > 2 versus 3),
> >
> > We can assume _either_ the desired sample size in each group is the same
> > _or_ proportional to the population size.
> >
> > We can set p=0.25 and set p1=p2=p3=p so that the H0 is true.
> >
> > We can assume that the expected proportion of "Yes" values in each group
> > is 0.25
> >
> > For the alternative hypotheses, for example, we can set p1 = .25,
> > p2=.25, p3=.35
> >
> >
> > Again thank you very much in advance.
> >
> > abou
> >
> > ______________________
> >
> > *AbouEl-Makarim Aboueissa, PhD
> > *
> > *
> > *
> > *Professor, Statistics and Data Science*
> > *Graduate Coordinator*
> > *Department of Mathematics and Statistics
> > *
> > *University of Southern Maine*
> >
> >
> >
> > On Mon, Aug 9, 2021 at 10:53 AM Marc Schwartz <marc_schwartz using me.com
> > <mailto:marc_schwartz using me.com>> wrote:
> >
> > Hi,
> >
> > You are going to need to provide more information than what you have
> > below and I may be mis-interpreting what you have provided.
> >
> > Presuming you are designing a prospective, three-group, randomized
> > allocation study, there is typically an a priori specification of the
> > ratios of the sample sizes for each group such as 1:1:1, indicating
> > that
> > the desired sample size in each group is the same.
> >
> > You would also need to specify the expected proportions of "Yes"
> values
> > in each group.
> >
> > Further, you need to specify how you are going to compare the
> > proportions in each group. Are you going to perform an initial
> omnibus
> > test of all three groups (e.g. 3 x 2 chi-square), possibly followed
> by
> > all possible 2 x 2 pairwise comparisons (e.g. 1 versus 2, 1 versus
> 3, 2
> > versus 3), or are you just going to compare 2 versus 1, and 3 versus
> 1,
> > where 1 is a control group?
> >
> > Depending upon your testing plan, you may also need to account for p
> > value adjustments for multiple comparisons, in which case, you also
> > need
> > to specify what adjustment method you plan to use, to know what the
> > target alpha level will be.
> >
> > On the other hand, if you already have the data collected, thus have
> > fixed sample sizes available per your wording below, simply go ahead
> > and
> > perform your planned analyses, as the notion of "power" is largely
> an a
> > priori consideration, which reflects the probability of finding a
> > "statistically significant" result at a given alpha level, given that
> > your a priori assumptions are valid.
> >
> > Regards,
> >
> > Marc Schwartz
> >
> >
> > AbouEl-Makarim Aboueissa wrote on 8/9/21 9:41 AM:
> > > Dear All: good morning
> > >
> > > *Re:* Sample Size Determination to Compare Three Independent
> > Proportions
> > >
> > > *Situation:*
> > >
> > > Three Binary variables (Yes, No)
> > >
> > > Three independent populations with fixed sizes (*say:* N1 = 1500,
> > N2 = 900,
> > > N3 = 1350).
> > >
> > > Power = 0.80
> > >
> > > How to choose the sample sizes to compare the three proportions
> > of “Yes”
> > > among the three variables.
> > >
> > > If you know a reference to this topic, it will be very helpful
> too.
> > >
> > > with many thanks in advance
> > >
> > > abou
> > > ______________________
> > >
> > >
> > > *AbouEl-Makarim Aboueissa, PhD*
> > >
> > > *Professor, Statistics and Data Science*
> > > *Graduate Coordinator*
> > >
> > > *Department of Mathematics and Statistics*
> > > *University of Southern Maine*
> > >
> >
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list