[Rd] bug(?) in chisq.test

Spencer Graves spencer.graves at pdf.com
Tue Mar 9 16:09:11 MET 2004


      This is a message for whoever maintains "chisq.test":  For an 
outcome more extreme than 2000 simulations, a Monte Carlo p-value of "< 
2.2e-16" was printed.  Ripley said the proper p-value for such cases 
should be 1/(B+1) = 1/2001.  This can be easily fixed by adding 
"if(PVAL==0)PVAL <- 1/(B+1)" right after the following line in the code 
for chisq.test (in R 1.8.1 for Windows): 

>             PVAL <- sum(tmp$results >= STATISTIC)/B

      Thanks for all your hard work for the R Project. 
      Best Wishes,
      spencer graves

-------- Original Message --------
Subject: 	Re: [R] Monte Carlo p-value (was "question")
Date: 	Tue, 9 Mar 2004 08:17:02 +0000 (GMT)
From: 	Prof Brian Ripley <ripley at stats.ox.ac.uk>
To: 	Spencer Graves <spencer.graves at pdf.com>
CC: 	cfinet at ens-lyon.fr, r-help at stat.math.ethz.ch, kjetil at entelnet.bo



1/(B+1) is the significance level of the Monte Carlo test if the data 
give the most extreme value.  (E.g. Ripley, 1987, p. 171.)  This is a 
calculation, not a convention, and assumes a continuously distributed 
statistic.

On Mon, 8 Mar 2004, Spencer Graves wrote:

>       What is the standard convention for a Monte Carlo p-value when the 
> observed outcome is more extreme than all simulations?  The example 
> provided by Cédric Fine produced a Monte Carlo p-value, according to 
> Kjetil Halvorsen, of "2.2e-16 (based on 2000 replicates)".  This seems 
> inappropriate to me. 
> 
>       By reading the code, I found that for this case,
> 
>             PVAL <- sum(tmp$results >= STATISTIC)/B
> 
> where STATISTIC is the observed chi-square while "tmp$results" is a 
> vector of length B of chi-squares from Monte Carlo simulated tables with 
> the same marginals.  Thus, PVAL ranges over seq(0, 1, length=(B+1)).  
> For the observed table, presumably, PVAL = 0.  The function "chisq.test" 
> apparently returns an object of class "htest", and "print.htest" calls 
> "format.pval(x$p.value, digits = digits)", for which "format.pval(0, 4)" 
> is "< 2.2e-16". 
> 
>       Can someone provide an appropriate reference or sense of the 
> literature on the appropriate number to report for a Monte Carlo p-value 
> when the observed is more extreme than all the simulations?  A value of 
> 0 or "< 2.2e-16" violates my sense of the logic of this situation.  If 
> the appropriate number is 0.5/B, then the line "PVAL <- sum(tmp$results 
>  >= STATISTIC)/B" could be followed immediately by something like the 
> following: 
> 
>       if(PVAL==0) PVAL <- 0.5/B
> 
>       Comments?
>       Best Wishes,
>       Spencer Graves
> 
> kjetil at entelnet.bo wrote:
> 
> >On 8 Mar 2004 at 16:38, cfinet at ens-lyon.fr wrote:
> >
> >  
> >
> >>I do not manage to make a Fisher´s exact test with the next matrix :
> >>
> >>    
> >>
> >
> >You can consider using chisq.test() with the argument sim=TRUE:
> >
> >  
> >
> >>mat <- matrix(scan(), 7, 12, byrow=TRUE)
> >>    
> >>
> >1:  1    3    0    1    2    9    0    0    2     5     8     6
> >13:    0    3    3    0    0    0    0    5    0     3     0     0
> >25:    0    0    0    0    0   10    0    0    0     0    10     0
> >37:    0    2    0    2    6   14    0    0    6     0    10     6
> >49:    0    5    0    0    0    7    0    0    0     2     8     0
> >61:    0    0    1    9    4    7    2    1    4     2    12     5
> >73:    0    6    0    0    0    0    0    0    0     5     1     3
> >85: 
> >Read 84 items
> >
> >  
> >
> >>chisq.test(mat, sim=TRUE)
> >>    
> >>
> >
> >        Pearson's Chi-squared test with simulated p-value (based
> >        on 2000 replicates)
> >
> >data:  mat 
> >X-squared = 218.3366, df = NA, p-value = < 2.2e-16
> >
> >Kjetil Halvorsen
> >
> >
> >
> >  
> >
> >>    [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
> >>[1,]    1    3    0    1    2    9    0    0    2     5     8     6
> >>[2,]    0    3    3    0    0    0    0    5    0     3     0     0
> >>[3,]    0    0    0    0    0   10    0    0    0     0    10     0
> >>[4,]    0    2    0    2    6   14    0    0    6     0    10     6
> >>[5,]    0    5    0    0    0    7    0    0    0     2     8     0
> >>[6,]    0    0    1    9    4    7    2    1    4     2    12     5
> >>[7,]    0    6    0    0    0    0    0    0    0     5     1     3
> >>
> >>but I do not understand why it does not work since I obtain the next
> >>error message :
> >>
> >>
> >>
> >>    
> >>
> >>>fisher.test(enfin.matrix)
> >>>      
> >>>
> >>Error in fisher.test(enfin.matrix) : Bug in FEXACT: gave negative key
> >>
> >>thank you for considering my application
> >>
> >>cédric finet
> >>
> >>______________________________________________
> >>R-help at stat.math.ethz.ch mailing list
> >>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> >>PLEASE do read the posting guide!
> >>http://www.R-project.org/posting-guide.html
> >>    
> >>
> >
> >______________________________________________
> >R-help at stat.math.ethz.ch mailing list
> >https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> >  
> >
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 
> 

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



More information about the R-devel mailing list