[Rd] The two chisq.test p values differ when the contingency table is transposed! (PR#3486)

dmurdoch at pair.com dmurdoch at pair.com
Fri Aug 15 17:42:14 MEST 2003


>Date: Wed, 16 Jul 2003 01:27:25 +0200 (MET DST)
>From: shitao at ucla.edu

>> x
>     [,1] [,2]
>[1,]  149  151
>[2,]    1    8
>> c2x<-chisq.test(x, simulate.p.value=T, B=100000)$p.value
>> for(i in (1:20)){c2x<-c(c2x,chisq.test(x,
>simulate.p.value=T,B=100000)$p.value)}
>> c2tx<-chisq.test(t(x), simulate.p.value=T, B=100000)$p.value
>> for(i in (1:20)){c2tx<-c(c2tx,chisq.test(t(x), simulate.p.value=T,
>+ B=100000)$p.value)}
>> cbind(c2x,c2tx)
>          c2x    c2tx
> [1,] 0.03711 0.01683
> [2,] 0.03717 0.01713

The problem is in ctest/R/chisq.test.R, where the p-value is
calculated as 

            STATISTIC <- sum((x - E) ^ 2 / E)
            PARAMETER <- NA
            PVAL <- sum(tmp$results >= STATISTIC) / B

Here tmp$results is a collection of simulated chisquare values, but
because of different rounding, the statistics corresponding to tables
equal to the observed table are slightly smaller than the value
calculated in STATISTIC, and effectively the p-value is calcuated as

             PVAL <- sum(tmp$results > STATISTIC) / B

instead.

What's the appropriate fix here? 

PVAL <- sum(tmp$results > STATISTIC - .Machine$double.eps^0.5) / B

works on this example, but is there something better?

Duncan Murdoch



More information about the R-devel mailing list