[Rd] fisher.test(... , simulate.p.value=TRUE) (PR#10558)

Tue Jan 8 10:25:04 CET 2008

Full_Name: Robin Hankin
Version: R-2.6.1
OS: macosx 10.5.1
Submission from: (NULL) (139.166.245.10)

fisher.test() with a matrix a <- diag(1:3) has a p-value of 1/60 ~= 0.016666

Setting the simulate.p.value flag to TRUE gives a very incorrect answer:

> fisher.test(a,simulate.p.value=TRUE)$p.value
[1] 0.0004997501
> fisher.test(a,simulate.p.value=TRUE,B=10000)$p.value
[1] 9.999e-05
> 

These values are repeatable [there is no variability].  The relevant line in
fisher.test() is reproduced below; it is the one where PVAL is set:

       n <- sum(sc)
            STATISTIC <- -sum(lfactorial(x))
            tmp <- .C("fisher_sim", as.integer(nr), as.integer(nc), 
                as.integer(sr), as.integer(sc), as.integer(n), 
                as.integer(B), integer(nr * nc), double(n + 1), 
                integer(nc), results = double(B), PACKAGE = "stats")$results
            almost.1 <- 1 + 64 * .Machine$double.eps
            PVAL <- (1 + sum(tmp <= almost.1 * STATISTIC))/(B + 
                1)
        }

the problem is sum(tmp <= almost.1 * STATISTIC).  This line would be correct if
tmp (and STATISTIC) were positive, but they are negative (or zero).  The
fisher.test() is returning 1/(B+1) because *no* element of tmp satisfies tmp <=
almost.1 * STATISTIC.

 Changing the line to read 

PVAL <- (1 + sum(tmp <=  STATISTIC/almost.1))/(B +  1)

fixes the problem:

> myfish(a,simulate.p.value=TRUE,B=10000)$p.value
[1] 0.01749825
> myfish(a,simulate.p.value=TRUE,B=10000)$p.value
[1] 0.01779822
> myfish(a,simulate.p.value=TRUE,B=10000)$p.value
[1] 0.01559844
> myfish(a,simulate.p.value=TRUE,B=10000)$p.value
[1] 0.01739826
> 

See how the p-values agree with the theoretical value of 1/60 (more or less).