[Rd] fisher.test(... , simulate.p.value=TRUE) (PR#10558)
r.hankin at noc.soton.ac.uk
r.hankin at noc.soton.ac.uk
Tue Jan 8 10:25:04 CET 2008
Full_Name: Robin Hankin
Version: R-2.6.1
OS: macosx 10.5.1
Submission from: (NULL) (139.166.245.10)
fisher.test() with a matrix a <- diag(1:3) has a p-value of 1/60 ~= 0.016666
Setting the simulate.p.value flag to TRUE gives a very incorrect answer:
> fisher.test(a,simulate.p.value=TRUE)$p.value
[1] 0.0004997501
> fisher.test(a,simulate.p.value=TRUE,B=10000)$p.value
[1] 9.999e-05
>
These values are repeatable [there is no variability]. The relevant line in
fisher.test() is reproduced below; it is the one where PVAL is set:
n <- sum(sc)
STATISTIC <- -sum(lfactorial(x))
tmp <- .C("fisher_sim", as.integer(nr), as.integer(nc),
as.integer(sr), as.integer(sc), as.integer(n),
as.integer(B), integer(nr * nc), double(n + 1),
integer(nc), results = double(B), PACKAGE = "stats")$results
almost.1 <- 1 + 64 * .Machine$double.eps
PVAL <- (1 + sum(tmp <= almost.1 * STATISTIC))/(B +
1)
}
the problem is sum(tmp <= almost.1 * STATISTIC). This line would be correct if
tmp (and STATISTIC) were positive, but they are negative (or zero). The
fisher.test() is returning 1/(B+1) because *no* element of tmp satisfies tmp <=
almost.1 * STATISTIC.
Changing the line to read
PVAL <- (1 + sum(tmp <= STATISTIC/almost.1))/(B + 1)
fixes the problem:
> myfish(a,simulate.p.value=TRUE,B=10000)$p.value
[1] 0.01749825
> myfish(a,simulate.p.value=TRUE,B=10000)$p.value
[1] 0.01779822
> myfish(a,simulate.p.value=TRUE,B=10000)$p.value
[1] 0.01559844
> myfish(a,simulate.p.value=TRUE,B=10000)$p.value
[1] 0.01739826
>
See how the p-values agree with the theoretical value of 1/60 (more or less).
More information about the R-devel
mailing list