[Rd] RFC: lchoose() vs lfactorial() etc

Petr Savicky savicky at cs.cas.cz
Fri Dec 18 00:14:49 CET 2009


On Thu, Dec 17, 2009 at 03:10:49PM +0100, Martin Maechler wrote:
[...]
>     MM> This, of course, is an even more compelling reason to implement
>     MM> the change of return  log(abs(choose(.,.)),
>     MM> and at the moment, I'd even plan to "backport" that to R "2.10.1
>     MM> patched", as the current behavior is clearly bugous.
> 
> This has now happened;
> svn revisions  50775 and 50776  {R-devel and R-patched}.

Thank you.

When preparing a test, whether lchoose(n, k) \approx log(abs(choose(n, k))), it
appeared that there is a minor problem also in choose(n, k), when n is very close
to k, but not equal.

  options(digits=15)
  n <- 5 + (-15:15)*1e-8
  cbind(n, choose(n, 5))
  #                n                  
  #  [1,] 4.99999985 0.999999657500042
  #  [2,] 4.99999986 0.999999680333370
  #  [3,] 4.99999987 0.999999703166698
  #  [4,] 4.99999988 0.999999726000027
  #  [5,] 4.99999989 0.999999748833356
  #  [6,] 4.99999990 0.999999771666685
  #  [7,] 4.99999991 0.000000000000000
  #  [8,] 4.99999992 0.000000000000000
  #  [9,] 4.99999993 0.000000000000000
  # [10,] 4.99999994 0.000000000000000
  # [11,] 4.99999995 0.000000000000000
  # [12,] 4.99999996 0.000000000000000
  # [13,] 4.99999997 0.000000000000000
  # [14,] 4.99999998 0.000000000000000
  # [15,] 4.99999999 0.000000000000000
  # [16,] 5.00000000 1.000000000000000
  # [17,] 5.00000001 5.000000000000000
  # [18,] 5.00000002 5.000000000000000
  # [19,] 5.00000003 5.000000000000000
  # [20,] 5.00000004 5.000000000000000
  # [21,] 5.00000005 5.000000000000000
  # [22,] 5.00000006 5.000000000000000
  # [23,] 5.00000007 5.000000000000000
  # [24,] 5.00000008 5.000000000000000
  # [25,] 5.00000009 5.000000000000000
  # [26,] 5.00000010 1.000000228333353
  # [27,] 5.00000011 1.000000251166690
  # [28,] 5.00000012 1.000000274000027
  # [29,] 5.00000013 1.000000296833365
  # [30,] 5.00000014 1.000000319666704
  # [31,] 5.00000015 1.000000342500042

All the values in the second column should be close to 1, but some are 0 and
some are 5. The reason seems to be in the line 112 of src/nmath/choose.c,
which is
  if(n-k < k && n >= 0 && R_IS_INT(n)) k = n-k; /* <- Symmetry */
If n is not exactly an integer, then k becomes also non-integer. 
Since the code relies on k being an exact integer, we get an error
as follows.

If n = k - eps for a small positive eps, then the next line 113
  if (k <  0) return 0.;
determines the output since k is now - eps.

If n = k + eps for a small positive eps, then the line 116
  r = n;
determines the output, since now k = eps, so the condition j <= k is not 
satisfied already at the beginning of the next cycle.

I suggest two solutions, a more conservative one and a less conservative one.

A more conservative solution is to replace the line 112 by
  if(n-k < k && n >= 0 && R_IS_INT(n)) k = floor(n - k + 0.5); /* <- Symmetry */
This yields the following in the same test as above.

  #                n                  
  #  [1,] 4.99999985 0.999999657500042
  #  [2,] 4.99999986 0.999999680333370
  #  [3,] 4.99999987 0.999999703166698
  #  [4,] 4.99999988 0.999999726000027
  #  [5,] 4.99999989 0.999999748833356
  #  [6,] 4.99999990 0.999999771666685
  #  [7,] 4.99999991 1.000000000000000
  #  [8,] 4.99999992 1.000000000000000
  #  [9,] 4.99999993 1.000000000000000
  # [10,] 4.99999994 1.000000000000000
  # [11,] 4.99999995 1.000000000000000
  # [12,] 4.99999996 1.000000000000000
  # [13,] 4.99999997 1.000000000000000
  # [14,] 4.99999998 1.000000000000000
  # [15,] 4.99999999 1.000000000000000
  # [16,] 5.00000000 1.000000000000000
  # [17,] 5.00000001 1.000000000000000
  # [18,] 5.00000002 1.000000000000000
  # [19,] 5.00000003 1.000000000000000
  # [20,] 5.00000004 1.000000000000000
  # [21,] 5.00000005 1.000000000000000
  # [22,] 5.00000006 1.000000000000000
  # [23,] 5.00000007 1.000000000000000
  # [24,] 5.00000008 1.000000000000000
  # [25,] 5.00000009 1.000000000000000
  # [26,] 5.00000010 1.000000228333353
  # [27,] 5.00000011 1.000000251166690
  # [28,] 5.00000012 1.000000274000027
  # [29,] 5.00000013 1.000000296833365
  # [30,] 5.00000014 1.000000319666704
  # [31,] 5.00000015 1.000000342500042

So, the extreme values 0 and 5 do not occur, but there is an interval
of constant values and on both ends of this interval, there is a jump
much larger than machine accuracy.

A less conservative solution (a patch is in an attachment) replaces
lines 110 - 121 by

    if (k < k_small_max) {
        int j;
        if(n-k < k && n >= 0 && (n == floor(n))) k = n-k; /* <- Symmetry */
        if (k <  0) return 0.;
        if (k == 0) return 1.;
        /* else: k >= 1 */
        r = n;
        for(j = 2; j <= k; j++) {
            r *= (n-j+1);
            r /= j;
        }
        return r;
    }

Here, the symetry is used only for exact integers. The symetry is
valid only for integers, since choose(n, k) is a polynomial of
degree k. So, for different values of k, we get a different functions
on any interval of positive length.

The division at the line
            r /= j;
always produces an exact integer, since during the whole cycle, r is always
either an integer binomial coefficient or its integer multiple. So, the
division has an integer result and there is no rounding error unless the
intermediate result does not fit into 53 bit precison.

Using the above code, we get

  options(digits=15)
  n <- 5 + (-15:15)*1e-8
  cbind(n, choose(n, 5))
  #                n                  
  #  [1,] 4.99999985 0.999999657500042
  #  [2,] 4.99999986 0.999999680333370
  #  [3,] 4.99999987 0.999999703166698
  #  [4,] 4.99999988 0.999999726000027
  #  [5,] 4.99999989 0.999999748833356
  #  [6,] 4.99999990 0.999999771666685
  #  [7,] 4.99999991 0.999999794500014
  #  [8,] 4.99999992 0.999999817333345
  #  [9,] 4.99999993 0.999999840166677
  # [10,] 4.99999994 0.999999863000008
  # [11,] 4.99999995 0.999999885833339
  # [12,] 4.99999996 0.999999908666670
  # [13,] 4.99999997 0.999999931500002
  # [14,] 4.99999998 0.999999954333335
  # [15,] 4.99999999 0.999999977166667
  # [16,] 5.00000000 1.000000000000000
  # [17,] 5.00000001 1.000000022833334
  # [18,] 5.00000002 1.000000045666667
  # [19,] 5.00000003 1.000000068500001
  # [20,] 5.00000004 1.000000091333336
  # [21,] 5.00000005 1.000000114166671
  # [22,] 5.00000006 1.000000137000006
  # [23,] 5.00000007 1.000000159833342
  # [24,] 5.00000008 1.000000182666680
  # [25,] 5.00000009 1.000000205500016
  # [26,] 5.00000010 1.000000228333353
  # [27,] 5.00000011 1.000000251166690
  # [28,] 5.00000012 1.000000274000027
  # [29,] 5.00000013 1.000000296833365
  # [30,] 5.00000014 1.000000319666704
  # [31,] 5.00000015 1.000000342500042

Within the chosen accuracy 1e-8, there is no visible jump.
With 1e-15, we get

  options(digits=15)
  n <- 5 + (-15:15)*1e-15
  cbind(n, choose(n, 5))
  #                      n                  
  #  [1,] 4.99999999999998 0.999999999999966
  #  [2,] 4.99999999999999 0.999999999999967
  #  [3,] 4.99999999999999 0.999999999999970
  #  [4,] 4.99999999999999 0.999999999999972
  #  [5,] 4.99999999999999 0.999999999999976
  #  [6,] 4.99999999999999 0.999999999999978
  #  [7,] 4.99999999999999 0.999999999999980
  #  [8,] 4.99999999999999 0.999999999999982
  #  [9,] 4.99999999999999 0.999999999999984
  # [10,] 4.99999999999999 0.999999999999986
  # [11,] 4.99999999999999 0.999999999999988
  # [12,] 5.00000000000000 0.999999999999990
  # [13,] 5.00000000000000 0.999999999999994
  # [14,] 5.00000000000000 0.999999999999996
  # [15,] 5.00000000000000 0.999999999999998
  # [16,] 5.00000000000000 1.000000000000000
  # [17,] 5.00000000000000 1.000000000000002
  # [18,] 5.00000000000000 1.000000000000004
  # [19,] 5.00000000000000 1.000000000000006
  # [20,] 5.00000000000000 1.000000000000010
  # [21,] 5.00000000000001 1.000000000000012
  # [22,] 5.00000000000001 1.000000000000014
  # [23,] 5.00000000000001 1.000000000000016
  # [24,] 5.00000000000001 1.000000000000018
  # [25,] 5.00000000000001 1.000000000000020
  # [26,] 5.00000000000001 1.000000000000022
  # [27,] 5.00000000000001 1.000000000000024
  # [28,] 5.00000000000001 1.000000000000028
  # [29,] 5.00000000000001 1.000000000000031
  # [30,] 5.00000000000001 1.000000000000032
  # [31,] 5.00000000000002 1.000000000000035

So, the jump in the exactly integer value n[16] == 5 is comparable
to machine accuracy.

I would be pleased to provide more tests, if some of the above solutions
or their modifications can be accepted.

Petr Savicky.

-------------- next part --------------
--- R-devel_2009-12-16/src/nmath/choose.c	2008-03-17 17:52:35.000000000 +0100
+++ R-devel_less_conservative/src/nmath/choose.c	2009-12-17 22:43:54.000000000 +0100
@@ -107,19 +107,20 @@
 #endif
     if (fabs(k - k0) > 1e-7)
 	MATHLIB_WARNING2(_("'k' (%.2f) must be integer, rounded to %.0f"), k0, k);
     if (k < k_small_max) {
 	int j;
-	if(n-k < k && n >= 0 && R_IS_INT(n)) k = n-k; /* <- Symmetry */
+	if(n-k < k && n >= 0 && (n == floor(n))) k = n-k; /* <- Symmetry */
 	if (k <	 0) return 0.;
 	if (k == 0) return 1.;
 	/* else: k >= 1 */
 	r = n;
-	for(j = 2; j <= k; j++)
-	    r *= (n-j+1)/j;
-	return R_IS_INT(n) ? floor(r + 0.5) : r;
-	/* might have got rounding errors */
+	for(j = 2; j <= k; j++) {
+	    r *= (n-j+1);
+	    r /= j;
+	}
+	return r;
     }
     /* else: k >= k_small_max */
     if (n < 0) {
 	r = choose(-n+ k-1, k);
 	if (ODD(k)) r = -r;


More information about the R-devel mailing list