[Rd] The two chisq.test p values differ when the contingency table
is transposed! (PR#3486)
dmurdoch at pair.com
dmurdoch at pair.com
Fri Aug 15 17:42:14 MEST 2003
>Date: Wed, 16 Jul 2003 01:27:25 +0200 (MET DST)
>From: shitao at ucla.edu
>> x
> [,1] [,2]
>[1,] 149 151
>[2,] 1 8
>> c2x<-chisq.test(x, simulate.p.value=T, B=100000)$p.value
>> for(i in (1:20)){c2x<-c(c2x,chisq.test(x,
>simulate.p.value=T,B=100000)$p.value)}
>> c2tx<-chisq.test(t(x), simulate.p.value=T, B=100000)$p.value
>> for(i in (1:20)){c2tx<-c(c2tx,chisq.test(t(x), simulate.p.value=T,
>+ B=100000)$p.value)}
>> cbind(c2x,c2tx)
> c2x c2tx
> [1,] 0.03711 0.01683
> [2,] 0.03717 0.01713
The problem is in ctest/R/chisq.test.R, where the p-value is
calculated as
STATISTIC <- sum((x - E) ^ 2 / E)
PARAMETER <- NA
PVAL <- sum(tmp$results >= STATISTIC) / B
Here tmp$results is a collection of simulated chisquare values, but
because of different rounding, the statistics corresponding to tables
equal to the observed table are slightly smaller than the value
calculated in STATISTIC, and effectively the p-value is calcuated as
PVAL <- sum(tmp$results > STATISTIC) / B
instead.
What's the appropriate fix here?
PVAL <- sum(tmp$results > STATISTIC - .Machine$double.eps^0.5) / B
works on this example, but is there something better?
Duncan Murdoch
More information about the R-devel
mailing list