[R] strange fisher.test result

Tue Apr 3 18:15:44 CEST 2007

Thomas Lumley wrote:
> On Mon, 2 Apr 2007, ted.harding at nessie.mcc.ac.uk wrote:
>   
>>> From the above, the marginal totals for his 2x2 table
>>>       
>>  a   b   =   16    8
>>
>>  c   d       15   24
>>
>> are (rows then columns) 24,39,31,32
>>
>> These fixed marginals mean that the whole table is determined
>> by the value of a. The following function P.FX() computes the
>> probabilities of all possible tables, conditional on the
>> marginal totals (it is much more transparent than the code
>> for the same purpose in fisher.test()):
>>     
>
> As this example has shown, 2x2 tables are a nice opportunity for 
> illustrating how the ordering of the sample space affects inference 
> (because you can actually see the whole sample space).
>
> I used something like this as a term project in an introductory R class, 
> where we wrote code to compute the probabilities for all outcomes 
> conditional on one margin, and used this to get (conservative) exact 
> versions of all the popular tests in 2x2 tables.  It's interesting to do 
> things like compare the rejection regions and power under various 
> alternatives for the exact versions of the likelihood ratio test and 
> Fisher's test.  We didn't get as far as confidence intervals, but the code 
> is at
>     http://faculty.washington.edu/tlumley/b514/exacttest.R
> with .Rd files at 
>     http://faculty.washington.edu/tlumley/b514/man/
>   
The effect is already visible with binomial tests. In fact the last 
exercise in the section on categorical data in Introductory Statistics 
with R currently reads (the \Answer section is not in the actual book -- 
yet):

 Make a plot of the two-sided $p$ value for
  testing that the probability parameter is $x$ when the observations
  are 3 successes in 15 trials, for $x$ varying from 0 to 1 in steps of
  0.001. Explain what makes the definition of a two-sided confidence
  interval difficult.

  \Answer The curve shows substantial discontinuities where
  probability mass is shifted from one tail to the other, and also a
  number of local minima. A confidence region could be defined as
  those $p$ that there is no significant evidence against at level
  $\alpha$, but for some $\alpha$, that is not an interval.

   p <- seq(0,1,0.001)
   pval <- sapply(p,function(p)binom.test(3,15,p=p)$p.value)
   plot(p,pval,type="l")