[R] significance in difference of proportions

Mon Dec 1 18:41:06 CET 2003

Hello,

thanks for the replies to this subject. I'm using a fisher.test to test if
the proportions of my 2 samples are different (see Ted's example below).

The assumption was that the two samples are from the same population and that
they may contain a different number of "positives" (due to different
treatment). 

I may be able to figues out the true probability to get a "positive", since I
for some of my experiments I know the entire population. E.g. the samples
(111 items, and 10 items) come from a population of 10,000 items, and I know
that there are 200 positives in the population.

Is it possible to use the fisher test for testing equallity of proportions
and to include the known probability to find a positive - would that make
sense at all? If the two samples come from the same population the
probability to find a positive shouldn't influence the test for difference of
proportions, should it? 

At some point I'd like to extend the statistics so that the two samples can
come from 2 different populations (with known probability for the positives).

I'm happy to receive suggestions and comments on this.

	thanks a lot again for your help,

	Arne 

> 
> On 27-Nov-03 Arne.Muller at aventis.com wrote:
> > I've 2 samples A (111 items) and B (10 items) drawn from the same
> > unknown population. Witihn A I find 9 "positives" and in B 0
> > positives. I'd like to know if the 2 samples A and B are different,
> > ie is there a way to find out whether the number of "positives" is
> > significantly different in A and B?
> 
> Pretty obviously not, just from looking at the numbers:
> 
> 9 out of 111 -> p = P(positive) approx = 1/10
> 
> P(0 out of 10 when p = 1/10) is not unlikely (in fact = 0.35).
> 
> However, a Fisher exact test will give you a respectable P-value:
> 
> > library(ctest)
> > ?fisher.test
> > fisher.test(matrix(c(102,9,10,0),nrow=2))
>   [...]
>   p-value = 1
>   alternative hypothesis: true odds ratio is not equal to 1 
>   95 percent confidence interval:
>    0.000000 6.088391 
> > fisher.test(matrix(c(102,9,9,1),nrow=2))
>   p-value = 0.5926
> > fisher.test(matrix(c(102,9,8,2),nrow=2))
>   p-value = 0.2257
> > fisher.test(matrix(c(102,9,7,3),nrow=2))
>   p-value = 0.0605
> > fisher.test(matrix(c(102,9,6,4),nrow=2))
>   p-value = 0.01202
> 
> So there's a 95% CI (0,6.1) for the odds ratio which, for
> identical probabilities of "+", is 1.0 hence well within the CI.
> And, keeping the numbers for the larger sample fixed for
> simplicity, you have to go quite a way with the smaller one to get
> a result significant at 5%:
> 
> (102,9):(7,3) -> P = 0.06
> (102,9):(6,4) -> P = 0.01
> 
> and, to have 80% power (0.8 probability of this event), the
> probability of "+" in the second sample would have to be as
> high as 0.41.
> 
> Conclusion: your second sample size is quite inadequate except
> to detect rather large differences between the true proportions
> in the two cases!
> 
> Best wishes,
> Ted.
> 
> 
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
> Fax-to-email: +44 (0)870 167 1972
> Date: 27-Nov-03                                       Time: 17:43:00
> ------------------------------ XFMail ------------------------------
>