[R] binom.test() query

(Ted Harding) ted.harding at nessie.mcc.ac.uk
Thu Apr 5 23:35:44 CEST 2007


Hi Folks,
The recent correspondence about "strange fisher.test result",
and especially Peter Dalgaard's reply on Tue 03 April 2007
(which I want to investigate further) led me to take a close
look at the code for binom.test().

I now have a query!

The code for the two-sided case computes the p-value as follows:

      if (p == 0) (x == 0)
      else
        if (p == 1) (x == n) 
        else {
          relErr <- 1 + 1e-07
          d <- dbinom(x, n, p)
          m <- n * p 
          if (x == m) 1
          else
## PVAL if x < m
            if (x < m) {
              i <- seq(from = ceiling(m), to = n)
              y <- sum(dbinom(i, n, p) <= d * relErr)
   ## here:
              pbinom(x,n,p) + pbinom(n-y,n,p,lower=FALSE)
            }
## PVAL if x > m:
            else {
              i <- seq(from = 0, to = floor(m))
              y <- sum(dbinom(i, n, p) <= d * relErr)
   ## here:
              pbinom(y-1,n,p) + pbinom(x-1,n,p,lower=FALSE)
            }
          }
        }

The case "x < m" will do for my query (since it is the same
for the case "x > m").

If I understand the code correctly, when x (the observed binomial
number) is less than m (its expected value on the null hypothesis
being tested), then the p-value is the total probability in the
lower tail, from 0 to x inclusive, plus a "bit".

The "bit" is the probability in the upper tail of those value
from (n-y) to n, where y is the number of values of i greater
than or equal to (if m is an integer) m such that

   dbinom(i,n,p) <= d*(1 + 1e-07) = dbinom(x,n,p)*(1 + 1e-07).

or, in other words, ignoring the "1e-07", the values of i
which have binomial probability (on the NH) less than or equal
to the probability of x.

So, for x < m = n*p, the lower-tail contribution to the p-value
is the probability that X <= x, while the upper-tail contribution
is the probability that (X >= m)&(dbinom(X,n,p) <= dbinom(x,n,p)).

Now that is not so asymmetric as it looks, since for x < m = n*p
the values of dbinom(i,n,p) decrease as i decreases for i < x,
so in fact the left=hand tail is similarly the probability
that (X <= m)&(dbinom(X,n,p) <= dbinom(x,n,p)). In effect, therefore,
ignoring the "1e-07", we are simply calculating the probability
of "dbinom(X,n,p) <= dbinom(x,n,p)" overall.

But what is the purpose of the "1e-07"? I can see that it might
cause a miscalculation if the real intention is to simply sum
the probabilties dbinom(X,n,p) which are less than or equal to
dbinom(x,n,p).

For example (and I'm being a bit mischievous here, but I'm
theoretically entitled to be):

   p<-0.3527785166
   n<-20
   x<-(4:10)
   print(cbind(x,dbinom(x,n,p)),digits=10)

      x              
[1,]  4 0.07114577135
[2,]  5 0.12409328342
[3,]  6 0.16909761793
[4,]  7 0.18433877225
[5,]  8 0.16327483787
[6,]  9 0.11866078116
[7,] 10 0.07114577154

Now I run the standard binom.test():

x<-4 ; n<-20 ; p<-0.3527785166 ; binom.test(x,n,p)$p.value
[1] 0.2405347


Next, I modify the code for binom.test so that relErr <- 1
(instead of 1 + 1e-07), and I call this new function
binom.test.0 and run it on the same data:

binom.test.0(x,n,p)$p.value
[1] 0.1693889

The difference is the value of dbinom(10,n,p) = 0.07114577...

So the "1e-07" has done its dirty work -- it has added 40% to
my p-value! Admittedly it wasn't particularly significant to
start with, at 0.17 (in round numbers); but it's much less so
at 0.24, and if I wanted I'm sure I could construct a really
"significant" example!

Anyway, I dare say the factor (1 + 1e-07) is there as a precaution
against some possible numerical mishap (though I can't see what
it might be), but the fact that it can make such a difference in
special cases means that I think we need to know what it's doing
and why -- just in case we don't need whatever it's supposed
to be there for.

After all, if I were going to use the values of dbinom(x,n,p)
as the ordering of the sample space for the test, I'd simply
get PVAL as

sum(dbinom(which(dbinom((0:n),n,p)<=dbinom(x,n,p))-1,n,p))
[1] 0.1693889

(as above!).

Comments?

With thanks,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 05-Apr-07                                       Time: 22:29:21
------------------------------ XFMail ------------------------------



More information about the R-help mailing list