[Rd] numerical issues in chisq.test(simulate=TRUE) (PR#8224)

dgrove@fhcrc.org dgrove at fhcrc.org
Thu Oct 20 18:39:55 CEST 2005


Yes, only the middle matrix was problematic.  Others were included to show
what the value should (approx) be.  Sorry that I didn't mention I was
using 32 bit.  

There are of course very easy fixes to this, just wasn't sure what was
the "best" approach in this situation (i.e. wasn't sure if the usual
tolerance of sqrt(.Machine$double.eps) was too large.

Thanks much,
Doug


On Thu, 20 Oct 2005, Martin Maechler wrote:

> Thank you, Douglas (and Simone) for the bug report.
> 
> >>>>> "Simone" == Simone Giannerini <sgiannerini at gmail.com>
> >>>>>     on Thu, 20 Oct 2005 10:10:01 +0200 writes:
> 
>     Simone> Hi,
>     Simone> I obtain the same result under Win. XP SP2 on AMD 64 3700+
> 
>     Simone> platform i386-pc-mingw32
>     Simone> arch     i386
>     Simone> os       mingw32
>     Simone> system   i386, mingw32
>     Simone> status
>     Simone> major    2
>     Simone> minor    2.0
>     Simone> year     2005
>     Simone> month    10
>     Simone> day      06
>     Simone> svn rev  35749
>     Simone> language R
> 
>     >> m <- matrix(c(1,0,7,15),2,2) ; chisq.test(m, sim=TRUE)$p.value
>     Simone> [1] 0.3598201
> 
>     >> m <- matrix(c(1,0,7,16),2,2) ; chisq.test(m, sim=TRUE)$p.value
>     Simone> [1] 0.0004997501
> 
>     >> m <- matrix(c(1,0,7,17),2,2) ; chisq.test(m, sim=TRUE)$p.value
>     Simone> [1] 0.3403298
> 
> So, it's only the middle matrix giving problems for you.
> 
> It doesn't for me on a AMD 64-bit platform, even for 1000 simulations:
> 
> > m <- cbind(1:0, c(7,16))
> > summary(p <- replicate(1000, chisq.test(m, sim=TRUE)$p.value))
>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
>  0.3043  0.3268  0.3338  0.3337  0.3405  0.3663 
> 
> but it does show the problem on 32-bit one.
> 
> A fix is easy and will be in R-patched (and R-devel of
> course) soon.
> 
> Martin Maechler, ETH Zurich
> 
> 
>     Simone> Ciao
>     Simone> Simone
> 
>     Simone> On 10/20/05, dgrove at fhcrc.org <dgrove at fhcrc.org> wrote:
>     >> Hi,
>     >> 
>     >> This report deals with p-values coming from chisq.test using
>     >> the simulate.p=TRUE option.  The issue is numerical accuracy
>     >> and was brought up in previous bug reports 3486 and 3896.
>     >> The bug was considered fixed but apparently was only mostly
>     >> fixed.  Just the typical problem of two values that are
>     >> mathematically equal not ending up numerically equivalent.
>     >> 
>     >> Consider this series of three 2x2 tables:
>     >> 
>     >> [1,]    1    7
>     >> [2,]    0   15
>     >> 
>     >> [1,]    1    7
>     >> [2,]    0   16
>     >> 
>     >> [1,]    1    7
>     >> [2,]    0   17
>     >> 
>     >> 
>     >> The pvals returned from chisq.test(m, sim=TRUE)$p.value are
>     >> 0.3543228, 0.0004997501 and 0.3273363 respectively.
>     >> 
>     >> The 2nd seems a bit unlikely, huh?
>     >> 
>     >> I checked into it and the value I'm getting for the statistic
>     >> (calculated in R code) is 4*.Machine$double.eps less than the
>     >> value (which should be equal) that is returned from the C-code
>     >> that does the simulation.
>     >> 
>     >> 
>     >> Code for creating/testing the three matrices shown above:
>     >> m <- matrix(c(1,0,7,15),2,2) ; chisq.test(m, sim=TRUE)$p.value
>     >> m <- matrix(c(1,0,7,16),2,2) ; chisq.test(m, sim=TRUE)$p.value
>     >> m <- matrix(c(1,0,7,17),2,2) ; chisq.test(m, sim=TRUE)$p.value
>     >> 
>     >> 
>     >> Running SuSE9.3 on a AMD Athlon4000+
>     >> 
>     >> 
>     >> > version
>     >> platform i686-pc-linux-gnu
>     >> arch     i686
>     >> os       linux-gnu
>     >> system   i686, linux-gnu
>     >> status   Patched
>     >> major    2
>     >> minor    1.1
>     >> year     2005
>     >> month    07
>     >> day      29
>     >> language R
>     >> 
>     >> 
>     >> Thanks,
>     >> Doug
>     >> 
>     >> 
>     >> Douglas Grove
>     >> Statistical Research Associate
>     >> Fred Hutchinson Cancer Research Center
>     >> Seattle WA 98109
>



More information about the R-devel mailing list