[R] (no subject)

Fri Nov 18 18:33:24 CET 2005

Anna Pluzhnikov <apluzhni at bsd.uchicago.edu> writes:

> Hi,
> I need to run a Fisher's exact test on thousands of 2x2 contingency tables, and
> repeat this process several thousand times (this is a part of the permutation
> test for a genome-wide association study).
> 
> How can I run this process most efficiently? Is there any way to optimize R code?
>  
> I have my data in a 2x2xN array (N ~ 5 K; eventually N will be ~ 500 K), and use
> apply inside the loop:
> > for (iter in 1:1000) {
>     apply(data,3,fisherPval)
>   }
>   fisherPval <- function(x) {
>      fisher.test(x)$p.value
>   }
> Right now, it takes about 30 sec per iteration on an Intel Xeon 3.06GHz processor.
> 
> Thanks in advance. 

The appropriate application of phyper() should save you quite a bit,
especially if you're pragmatic and just use the two one-sided tests
rather than the two-sided one which is a bit harder to compute.
(Notice that phyper() is vectorized over all its arguments).

As in:

> M <- array(rpois(2*2*5000,lambda=20),c(2,2,500000))
> x <- M[1,1,]
> m <- M[1,1,]+M[2,1,]
> n <- M[1,2,]+M[2,2,]
> k <- M[1,1,]+M[1,2,]
> system.time(pleft<-phyper(x,m,n,k))
[1] 2.16 0.01 2.16 0.00 0.00
> sum(pleft < 0.05)
[1] 16400
> sum(pleft < 0.05)/500000
[1] 0.0328

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907