[R] binom.test() query
(Ted Harding)
ted.harding at nessie.mcc.ac.uk
Thu Apr 5 23:35:44 CEST 2007
Hi Folks,
The recent correspondence about "strange fisher.test result",
and especially Peter Dalgaard's reply on Tue 03 April 2007
(which I want to investigate further) led me to take a close
look at the code for binom.test().
I now have a query!
The code for the two-sided case computes the p-value as follows:
if (p == 0) (x == 0)
else
if (p == 1) (x == n)
else {
relErr <- 1 + 1e-07
d <- dbinom(x, n, p)
m <- n * p
if (x == m) 1
else
## PVAL if x < m
if (x < m) {
i <- seq(from = ceiling(m), to = n)
y <- sum(dbinom(i, n, p) <= d * relErr)
## here:
pbinom(x,n,p) + pbinom(n-y,n,p,lower=FALSE)
}
## PVAL if x > m:
else {
i <- seq(from = 0, to = floor(m))
y <- sum(dbinom(i, n, p) <= d * relErr)
## here:
pbinom(y-1,n,p) + pbinom(x-1,n,p,lower=FALSE)
}
}
}
The case "x < m" will do for my query (since it is the same
for the case "x > m").
If I understand the code correctly, when x (the observed binomial
number) is less than m (its expected value on the null hypothesis
being tested), then the p-value is the total probability in the
lower tail, from 0 to x inclusive, plus a "bit".
The "bit" is the probability in the upper tail of those value
from (n-y) to n, where y is the number of values of i greater
than or equal to (if m is an integer) m such that
dbinom(i,n,p) <= d*(1 + 1e-07) = dbinom(x,n,p)*(1 + 1e-07).
or, in other words, ignoring the "1e-07", the values of i
which have binomial probability (on the NH) less than or equal
to the probability of x.
So, for x < m = n*p, the lower-tail contribution to the p-value
is the probability that X <= x, while the upper-tail contribution
is the probability that (X >= m)&(dbinom(X,n,p) <= dbinom(x,n,p)).
Now that is not so asymmetric as it looks, since for x < m = n*p
the values of dbinom(i,n,p) decrease as i decreases for i < x,
so in fact the left=hand tail is similarly the probability
that (X <= m)&(dbinom(X,n,p) <= dbinom(x,n,p)). In effect, therefore,
ignoring the "1e-07", we are simply calculating the probability
of "dbinom(X,n,p) <= dbinom(x,n,p)" overall.
But what is the purpose of the "1e-07"? I can see that it might
cause a miscalculation if the real intention is to simply sum
the probabilties dbinom(X,n,p) which are less than or equal to
dbinom(x,n,p).
For example (and I'm being a bit mischievous here, but I'm
theoretically entitled to be):
p<-0.3527785166
n<-20
x<-(4:10)
print(cbind(x,dbinom(x,n,p)),digits=10)
x
[1,] 4 0.07114577135
[2,] 5 0.12409328342
[3,] 6 0.16909761793
[4,] 7 0.18433877225
[5,] 8 0.16327483787
[6,] 9 0.11866078116
[7,] 10 0.07114577154
Now I run the standard binom.test():
x<-4 ; n<-20 ; p<-0.3527785166 ; binom.test(x,n,p)$p.value
[1] 0.2405347
Next, I modify the code for binom.test so that relErr <- 1
(instead of 1 + 1e-07), and I call this new function
binom.test.0 and run it on the same data:
binom.test.0(x,n,p)$p.value
[1] 0.1693889
The difference is the value of dbinom(10,n,p) = 0.07114577...
So the "1e-07" has done its dirty work -- it has added 40% to
my p-value! Admittedly it wasn't particularly significant to
start with, at 0.17 (in round numbers); but it's much less so
at 0.24, and if I wanted I'm sure I could construct a really
"significant" example!
Anyway, I dare say the factor (1 + 1e-07) is there as a precaution
against some possible numerical mishap (though I can't see what
it might be), but the fact that it can make such a difference in
special cases means that I think we need to know what it's doing
and why -- just in case we don't need whatever it's supposed
to be there for.
After all, if I were going to use the values of dbinom(x,n,p)
as the ordering of the sample space for the test, I'd simply
get PVAL as
sum(dbinom(which(dbinom((0:n),n,p)<=dbinom(x,n,p))-1,n,p))
[1] 0.1693889
(as above!).
Comments?
With thanks,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 05-Apr-07 Time: 22:29:21
------------------------------ XFMail ------------------------------
More information about the R-help
mailing list