[R] fisher exact vs. simulated chi-square
Adaikalavan Ramasamy
gisar at nus.edu.sg
Tue Apr 22 15:04:08 CEST 2003
There is a reason why this is called Fisher's Exact test - it is EXACT.
It calculates all the possible outcomes using permutations. Then it
calculate the p-value as the proportion of number of times of obtaining.
Look for Fisher's Exact test under Section 8a of
http://faculty.vassar.edu/lowry/webtext.html.
Fisher's test is non-parametric and exact but using permutations can be
computationally intensive. For large counts, the parametric chiquare is
ok. When a cell contains too low a count (what is the default limit), R
correctly complains that chiquare may not be appropriate. Hope this
helps.
Regards, Adai.
-----Original Message-----
From: Dirk Janssen [mailto:dirkj at rz.uni-leipzig.de]
Sent: Tuesday, April 22, 2003 8:08 PM
To: r-help at stat.math.ethz.ch
Subject: [R] fisher exact vs. simulated chi-square
Dear All,
I have a problem understanding the difference between the outcome of a
fisher exact test and a chi-square test (with simulated p.value).
For some sample data (see below), fisher reports p=.02337. The normal
chi-square test complains about "approximation may be incorrect",
because there is a column with cells with very small values. I therefore
tried the chi-square with simulated p-values, but this still gives
p=.04037. I also simulated the p-value myself, using r2dtable, getting
the same result, p=0.04 (approx).
Why is this substantially higher than what the fisher exact says? Do the
two tests make different assumptions? I noticed that the discrepancy
gets smaller when I increase the number of observations for column A3.
Does this mean that the simulated chi-square is still sensitive to cells
with small counts, even though it does not give me the warning?
Thanks in advance,
Dirk Janssen
------------------------------------------------------------------
> ta <- matrix(c(45,85,27,32,40,34,1,2,1),nc=3,
dimnames=list(c("A","B","C"),c("A1","A2","A3")))
> ta
A1 A2 A3
A 45 32 1
B 85 40 2
C 27 34 1
> fisher.test(ta)
Fisher's Exact Test for Count Data
data: ta
p-value = 0.02337
alternative hypothesis: two.sided
> chisq.test(ta, simulate=T, B=100000)
Pearson's Chi-squared test with simulated p-value (based on
1e+05
replicates)
data: ta
X-squared = 9.6976, df = NA, p-value = 0.04037
> chisq.test(ta)
Pearson's Chi-squared test
data: ta
X-squared = 9.6976, df = 4, p-value = 0.04584
Warning message:
Chi-squared approximation may be incorrect in: chisq.test(ta)
# simulate values by hand, based on r2dtable example
> expected <- outer(rowSums(ta), colSums(ta), "*") / sum(ta) meanSqResid
> <- function(x) mean((x - expected) ^ 2 / expected)
> sum(sapply(r2dtable(100000, rowSums(ta), colSums(ta)), meanSqResid)
>= meanSqResid(ta))/ 100000
[1] 0.03939
# is similar to
> sum(sapply(r2dtable(100000, rowSums(ta), colSums(ta)),
function(x) { chisq.test(x)$statistic })
>= 9.6976)/ 100000
[1] 0.04044
There were 50 or more warnings (use warnings() to see the first 50)
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
More information about the R-help
mailing list