[Rd] phyper accuracy and efficiency (PR#6772)
terra at gnome.org
terra at gnome.org
Thu Apr 15 18:06:37 CEST 2004
Full_Name: Morten Welinder
Version: snapshot
OS:
Submission from: (NULL) (65.213.85.218)
Time to kick phyper's tires...
The current version has very serious cancellation issues. For example, if you
ask
for a small right-tail you are likely to get total cancellation. For example
phyper(59, 150, 150, 60, FALSE, FALSE) gives 6.372680161e-14. The right answer
is
dhyper(0, 150, 150, 60, FALSE) which is 5.111204798e-22.
phyper is also really slow for large arguments.
Therefore, I suggest using the code below. This is a sniplet from Gnumeric, so
you'll have to s/gnm_float/double/ and s/gnum//, etc. The code isn't perfect.
In fact, if i*(NR+NB) is close to n*NR, then this code can take a while.
Not longer than the old code, though.
/*
* New phyper implementation. Copyright 2004 Morten Welinder.
* Distributed under the GNU General Public License.
*
* Thanks to Ian Smith for ideas.
*/
/*
* Calculate
*
* phyper (i, NR, NB, n, TRUE, FALSE)
* [log] ----------------------------------
* dhyper (i, NR, NB, n, FALSE)
*
* without actually calling phyper. This assumes that
*
* i * (NR + NB) <= n * NR
*
*/
static gnm_float
pdhyper (gnm_float i, gnm_float NR, gnm_float NB, gnm_float n, gboolean log_p)
{
gnm_float sum = 0;
gnm_float term = 1;
while (i > 0 && term >= GNUM_EPSILON * sum) {
term *= i * (NB - n + i) / (n + 1 - i) / (NR + 1 - i);
sum += term;
i--;
}
return log_p ? log1pgnum (sum) : 1 + sum;
}
gnm_float
phyper (gnm_float i, gnm_float NR, gnm_float NB, gnm_float n, int lower_tail,
int log_p)
{
gnm_float d, pd;
#ifdef IEEE_754
if (isnangnum (i) || isnangnum (NR) || isnangnum (NB) || isnangnum (n))
return i + NR + NB + n;
#endif
i = floorgnum (i + 1e-7);
NR = floorgnum (NR + 0.5);
NB = floorgnum (NB + 0.5);
n = floorgnum (n + 0.5);
if (NR < 0 || NB < 0 || !finitegnum (NR + NB) || n < 0 || n > NR + NB)
ML_ERR_return_NAN;
if (i * (NR + NB) > n * NR) {
/* Swap tails. */
gnm_float oldNB = NB;
NB = NR;
NR = oldNB;
i = n - i - 1;
lower_tail = !lower_tail;
}
if (i < 0)
return R_DT_0;
d = dhyper (i, NR, NB, n, log_p);
pd = pdhyper (i, NR, NB, n, log_p);
return log_p ? R_DT_log (d + pd) : R_D_Lval (d * pd);
}
More information about the R-devel
mailing list