[R] (no subject)
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Fri Nov 18 18:33:24 CET 2005
Anna Pluzhnikov <apluzhni at bsd.uchicago.edu> writes:
> Hi,
> I need to run a Fisher's exact test on thousands of 2x2 contingency tables, and
> repeat this process several thousand times (this is a part of the permutation
> test for a genome-wide association study).
>
> How can I run this process most efficiently? Is there any way to optimize R code?
>
> I have my data in a 2x2xN array (N ~ 5 K; eventually N will be ~ 500 K), and use
> apply inside the loop:
> > for (iter in 1:1000) {
> apply(data,3,fisherPval)
> }
> fisherPval <- function(x) {
> fisher.test(x)$p.value
> }
> Right now, it takes about 30 sec per iteration on an Intel Xeon 3.06GHz processor.
>
> Thanks in advance.
The appropriate application of phyper() should save you quite a bit,
especially if you're pragmatic and just use the two one-sided tests
rather than the two-sided one which is a bit harder to compute.
(Notice that phyper() is vectorized over all its arguments).
As in:
> M <- array(rpois(2*2*5000,lambda=20),c(2,2,500000))
> x <- M[1,1,]
> m <- M[1,1,]+M[2,1,]
> n <- M[1,2,]+M[2,2,]
> k <- M[1,1,]+M[1,2,]
> system.time(pleft<-phyper(x,m,n,k))
[1] 2.16 0.01 2.16 0.00 0.00
> sum(pleft < 0.05)
[1] 16400
> sum(pleft < 0.05)/500000
[1] 0.0328
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list