[R] Fisher's Exact Test

Peter Dalgaard BSA p.dalgaard at biostat.ku.dk
Mon Feb 15 12:42:10 CET 1999

Simon Fear <fears at roycastle.liv.ac.uk> writes:

> Probably Kurt Hornik, the author of this package, will have already
> replied to this. The ctest package has indeed been recently updated (and
> a bug was fixed in the windows binary to boot). Though not immediately
> an R question, I wanted to comment on Peter's
> >Mind you, there's a paper by Yates lying somewhere in my "must read
> >some time" stack where he argues that the 2 * p procedure is more
> >correct...
> which becomes an R question if your think that there is such a thing as
> Kurt's package being "correct". Try this definition of a P value: the
> probability, under the null hypothesis, of observing data as, or more,
> extreme than that actually observed. If you buy this frequently
> quoted/paraphrased definition, and I suggest most people would, then you
> should go with latest version of ctest, i.e. summing the probabilities
> of all tables with probability of occurrence less than or equal to the
> probability of that observed (under the null hypothesis). BUT if you
> want your P value to be such that in the long run your type I error is
> exactly what you quote, then you use the double-one-sided P. (Actually
> you should use double-one-sided-MID-P to get even closer to stated type
> I error asymptotically overall, though for certain null hypothesis
> values you then sometimes under-quote.)

Yes.  There is also the "tail balance" version, i.e. you look in the opposite
tail for all the cases which have *tail* probability less than that
observed. I tend to like this one because of the difficulty with
distributions like the following

0  0.02
1  0.05
2  0.10
7  0.019
8  0.015
9  0.010
10 0.008

i.e. P(X<=0) = 0.02 P(X>=7) = 0.052  --- is 7 really as extreme as 0? 

Fortunately, none of the standard distributions are as nasty as that,
but conceptually there's a problem. The tail-balance method would only
count 9 and 10 as more extreme as 0, which I feel is much better,
although it unfortunately doesn't generalise to several dimensions
like the point probability method does.

Actually, the old fisher.test *tried* to do this, but had a bug,
causing it to sum the same tail twice... 

BTW: The reference for Yates' paper is JRSS A(1984), 426--463 (incl.
discussion). In particular the discussion section is full of
"quotation value", e.g. Finney:

"We should be grateful to Dr Yates for his characteristically
realistic account of matters that in recent years others have tended
to make increasingly obscure."
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list