[Rd] numerical issues in chisq.test(simulate=TRUE) (PR#8224)

maechler@stat.math.ethz.ch maechler at stat.math.ethz.ch
Thu Oct 20 11:00:02 CEST 2005


Thank you, Douglas (and Simone) for the bug report.

>>>>> "Simone" == Simone Giannerini <sgiannerini at gmail.com>
>>>>>     on Thu, 20 Oct 2005 10:10:01 +0200 writes:

    Simone> Hi,
    Simone> I obtain the same result under Win. XP SP2 on AMD 64 3700+

    Simone> platform i386-pc-mingw32
    Simone> arch     i386
    Simone> os       mingw32
    Simone> system   i386, mingw32
    Simone> status
    Simone> major    2
    Simone> minor    2.0
    Simone> year     2005
    Simone> month    10
    Simone> day      06
    Simone> svn rev  35749
    Simone> language R

    >> m <- matrix(c(1,0,7,15),2,2) ; chisq.test(m, sim=TRUE)$p.value
    Simone> [1] 0.3598201

    >> m <- matrix(c(1,0,7,16),2,2) ; chisq.test(m, sim=TRUE)$p.value
    Simone> [1] 0.0004997501

    >> m <- matrix(c(1,0,7,17),2,2) ; chisq.test(m, sim=TRUE)$p.value
    Simone> [1] 0.3403298

So, it's only the middle matrix giving problems for you.

It doesn't for me on a AMD 64-bit platform, even for 1000 simulations:

> m <- cbind(1:0, c(7,16))
> summary(p <- replicate(1000, chisq.test(m, sim=TRUE)$p.value))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.3043  0.3268  0.3338  0.3337  0.3405  0.3663 

but it does show the problem on 32-bit one.

A fix is easy and will be in R-patched (and R-devel of
course) soon.

Martin Maechler, ETH Zurich


    Simone> Ciao
    Simone> Simone

    Simone> On 10/20/05, dgrove at fhcrc.org <dgrove at fhcrc.org> wrote:
    >> Hi,
    >> 
    >> This report deals with p-values coming from chisq.test using
    >> the simulate.p=TRUE option.  The issue is numerical accuracy
    >> and was brought up in previous bug reports 3486 and 3896.
    >> The bug was considered fixed but apparently was only mostly
    >> fixed.  Just the typical problem of two values that are
    >> mathematically equal not ending up numerically equivalent.
    >> 
    >> Consider this series of three 2x2 tables:
    >> 
    >> [1,]    1    7
    >> [2,]    0   15
    >> 
    >> [1,]    1    7
    >> [2,]    0   16
    >> 
    >> [1,]    1    7
    >> [2,]    0   17
    >> 
    >> 
    >> The pvals returned from chisq.test(m, sim=TRUE)$p.value are
    >> 0.3543228, 0.0004997501 and 0.3273363 respectively.
    >> 
    >> The 2nd seems a bit unlikely, huh?
    >> 
    >> I checked into it and the value I'm getting for the statistic
    >> (calculated in R code) is 4*.Machine$double.eps less than the
    >> value (which should be equal) that is returned from the C-code
    >> that does the simulation.
    >> 
    >> 
    >> Code for creating/testing the three matrices shown above:
    >> m <- matrix(c(1,0,7,15),2,2) ; chisq.test(m, sim=TRUE)$p.value
    >> m <- matrix(c(1,0,7,16),2,2) ; chisq.test(m, sim=TRUE)$p.value
    >> m <- matrix(c(1,0,7,17),2,2) ; chisq.test(m, sim=TRUE)$p.value
    >> 
    >> 
    >> Running SuSE9.3 on a AMD Athlon4000+
    >> 
    >> 
    >> > version
    >> platform i686-pc-linux-gnu
    >> arch     i686
    >> os       linux-gnu
    >> system   i686, linux-gnu
    >> status   Patched
    >> major    2
    >> minor    1.1
    >> year     2005
    >> month    07
    >> day      29
    >> language R
    >> 
    >> 
    >> Thanks,
    >> Doug
    >> 
    >> 
    >> Douglas Grove
    >> Statistical Research Associate
    >> Fred Hutchinson Cancer Research Center
    >> Seattle WA 98109



More information about the R-devel mailing list