R-beta: Hypergeometric Probabilities

Vincent F. Melfi melfi at assist.stt.msu.edu
Mon Feb 23 21:08:58 CET 1998

In both versions of R to which I currently have access (R-0.16.1 and
R-0.61.1), "phyper" stops returning correct cumulative probabilities as the
parameters of the hypergeometric distribution get large. For example, when
N1=1345, N2=1055, and n=1330, phyper returns either 0 or 1, and nothing in

Looking at phyper.c, it's clear what's happening. First a term (called
"term") is computed: 

term = exp(lfastchoose(NR, xr) + lfastchoose(NB, xb)
                        - lfastchoose(N, n));

Then in a while loop, this is updated by multiplication, and each
subsequent "term" is added:

        while(xr <= x) {
                sum += term;
                term *= (NR / xr) * (xb / NB);

If the first "term" is 0, the rest will be. This is the case in the example
given above. 

One can get the correct cumulative probabilities by using "cumsum" and
"dhyper", of course. 

Possibly it would be wise to modify "phyper" to fix this problem. (One easy
modification would be to add an "if" statement to check on the parameters.
If they're suitably small, use the current method. If they're larger,
compute the probabilities as in "dhyper" and then add. 

A related question/comment: I assume similar problems may show up in other
algorithms in src/math and elsewhere. Would it be wise to either modify the
fucntions to return an error message in such cases, or to add something to
the help indicating the range of applicability?


Vince Melfi

PS: I'm by no means a programmer, so please be gentle if I've said anything
stupid in the above.
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list