[Rd] chisq.test with simulate.p.value=TRUE (PR#13292)

Tue Nov 18 05:22:24 CET 2008

 <constant <at> unb.br> writes:

> 
> Full_Name: Reginaldo Constantino
> Version: 2.8.0
> OS: Ubuntu Hardy (32 bit, kernel 2.6.24)
> Submission from: (NULL) (189.61.88.2)
> 
> For many tables, chisq.test with simulate.p.value=TRUE gives a p value that is
> obviously incorrect and inversely proportional to the number of replicates:
> 
> > data(HairEyeColor)
> > x <- margin.table(HairEyeColor, c(1, 2))
> > chisq.test(x,simulate.p.value=TRUE,B=2000)
>         Pearson's Chi-squared test with simulated p-value (based on 2000
>         replicates)
> data:  x
> X-squared = 138.2898, df = NA, p-value = 0.0004998
> 
> > chisq.test(x,simulate.p.value=TRUE,B=10000)
> X-squared = 138.2898, df = NA, p-value = 1e-04
> 
> > chisq.test(x,simulate.p.value=TRUE,B=100000)
> X-squared = 138.2898, df = NA, p-value = 1e-05
> 
> > chisq.test(x,simulate.p.value=TRUE,B=1000000)
> X-squared = 138.2898, df = NA, p-value = 1e-06
> ...
> 
> Also tested the same R version under Windows XP and got the same results.
> 

  Could you explain why this is wrong?
The data are extremely unlikely under the null hypothesis
(the standard chisq.test() gives p<2.2e-16), so the result
of the simulation protocol is always 1/(B+1); that is, as
is standard with these protocols, the observed value is added
to the ensemble of simulations.
  Why is the p value "obviously incorrect"?

  cheers
   Ben Bolker