[R] Sample size calculation for differences between two very small proportions (Fisher's exact test or others)?

Mon Nov 8 18:13:12 CET 2010

Hi,

I don't have access to the article, but must presume that they are doing something "radically different" if you are "only" getting a total sample size of 20,000. Or is that 20,000 per arm?

Using the G*Power app that Mitchell references below (which I have used previously, since they have a Mac app):

Exact - Proportions: Inequality, two independent groups (Fisher's exact test) 

Options:	Exact distribution

Analysis:	A priori: Compute required sample size 
Input:			Tail(s)                    	=	Two
			Proportion p1              	=	0.00154
			Proportion p2              	=	0.00234
			α err prob                 	=	0.05
			Power (1-β err prob)       	=	0.8
			Allocation ratio N2/N1     	=	1
Output:			Sample size group 1        	=	49851
			Sample size group 2        	=	49851
			Total sample size          	=	99702
			Actual power               	=	0.8168040
			Actual α                   	=	0.0462658

Using the base R power.prop.test() function:

> power.prop.test(p1 = 0.00154, p2 = 0.00234, power = 0.8)

     Two-sample comparison of proportions power calculation 

              n = 47490.34
             p1 = 0.00154
             p2 = 0.00234
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

 NOTE: n is number in *each* group 

Using Frank's bsamsize() function in Hmisc:

> bsamsize(p1 = 0.00154, p2 = 0.00234, fraction = .5, alpha = .05, power = .8)
      n1       n2 
47490.34 47490.34 

Finally, throwing together a quick Monte Carlo simulation using the FET, I get:

TwoSampleFET <- function(n, p1, p2, power = 0.85,
                         R = 5000, correct = FALSE)
{  
  MCSim <- function(n, p1, p2)
  {
    Control <- rbinom(n, 1, p1)
    Treat <- rbinom(n, 1, p2)
    fisher.test(cbind(table(Control), table(Treat)))$p.value
  }

  # Run MC Replicates
  MC.res <- replicate(R, MCSim(n, p1, p2))

  # Get p value at power quantile
  quantile(MC.res, power)
}

# 50,000 per arm
> TwoSampleFET(50000, p1 = 0.00154, p2 = 0.00234, power = 0.8, R = 500)
       80% 
0.04628263 

So all four of these are coming back with numbers in the 48,000 to 50,000 ***per arm***.

HTH,

Marc Schwartz

On Nov 8, 2010, at 10:16 AM, Mitchell Maltenfort wrote:

> Not with R, but look for G*Power3, a free tool for power calc,
> includes FIsher's test.
> 
> http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3
> 
> On Mon, Nov 8, 2010 at 10:52 AM, Giulio Di Giovanni
> <perimessaggini at hotmail.com> wrote:
>> 
>> 
>> Hi,
>> I'm try to compute the minimum sample size needed to have at least an 80% of power, with alpha=0.05. The problem is that empirical proportions are really small: 0.00154 in one case and 0.00234. These are the estimated failure proportion of two medical treatments.
>> Thomas and Conlon (1992) suggested Fisher's exact test and proposed a computational method, which according to their table gives a sample size of roughly 20000. Unfortunately I cannot find any software applying their method.
>> -Does anyone know how to estimate sample size on Fisher's exact test by using R?
>> -Even better, does anybody know other, maybe optimal, methods for such a situation (small p1 and p2) and the corresponding R software?
>> 
>> Thanks in advance,
>> Giulio