[Rd] [R] choose(n, k) as n approaches k

peter dalgaard pd@|gd @end|ng |rom gm@||@com
Wed Jan 15 13:59:55 CET 2020


Fix committed to R-devel w/regression test. I settled for just doing

k = R_forceint(n - k) 

inside 

if(n-k < k && n >= 0 && R_IS_INT(n)) k = n-k; /* <- Symmetry */

so that k stays integer. 

In principle, you also could prefix this code 

	r = n;
	for(j = 2; j <= k; j++)
	    r *= (n-j+1)/j;
	return R_IS_INT(n) ? R_forceint(r) : r;
	/* might have got rounding errors */
with

	if(R_IS_INT(n)) n = R_forceint(n);

but as I said, I believe that is really a no-op because of the rounding at the end.

-pd



> On 14 Jan 2020, at 19:20 , peter dalgaard <pdalgd using gmail.com> wrote:
> 
> OK, I see what you mean. But in those cases, we don't get the catastrophic failures from the 
> 
>        if (k <  0) return 0.;
>        if (k == 0) return 1.;
>        /* else: k >= 1 */
> 
> part, because at that point k is sure to be integer, possibly after rounding. 
> 
> It is when n-k is approximately but not exactly zero and we should return 1, that we either return 0 (negative case) or n (positive case; because the n(n-1)(n-2)... product has at least one factor). In the other cases, we get 1 or n(n-1)(n-2)...(n-k+1) which if n is near-integer gets rounded to produce an integer, due to the
> 
>        return R_IS_INT(n) ? R_forceint(r) : r;
> 
> part.
> 
> -pd
> 
> 
> 
>> On 14 Jan 2020, at 17:02 , Duncan Murdoch <murdoch.duncan using gmail.com> wrote:
>> 
>> On 14/01/2020 10:50 a.m., peter dalgaard wrote:
>>>> On 14 Jan 2020, at 16:21 , Duncan Murdoch <murdoch.duncan using gmail.com> wrote:
>>>> 
>>>> On 14/01/2020 10:07 a.m., peter dalgaard wrote:
>>>>> Yep, that looks wrong (probably want to continue discussion over on R-devel)
>>>>> I think the culprit is here (in src/nmath/choose.c)
>>>>>     if (k < k_small_max) {
>>>>>        int j;
>>>>>        if(n-k < k && n >= 0 && R_IS_INT(n)) k = n-k; /* <- Symmetry */
>>>>>        if (k <  0) return 0.;
>>>>>        if (k == 0) return 1.;
>>>>>        /* else: k >= 1 */
>>>>> if n is a near-integer, then k can become non-integer and negative. In your case,
>>>>> n == 4 - 1e-7
>>>>> k == 4
>>>>> n - k == -1e-7 < 4
>>>>> n >= 0
>>>>> R_IS_INT(n) = TRUE (relative diff < 1e-7 is allowed)
>>>>> so k gets set to
>>>>> n - k == -1e-7
>>>>> which is less than 0, so we return 0. However, as you point out, 1 would be more reasonable and in accordance with the limit as n -> 4, e.g.
>>>>>> factorial(4 - 1e-10)/factorial(1e-10)/factorial(4) -1
>>>>> [1] -9.289025e-11
>>>>> I guess that the fix could be as simple as replacing n by R_forceint(n) in the k = n - k step.
>>>> 
>>>> I think that would break symmetry:  you want choose(n, k) to equal choose(n, n-k) when n is very close to an integer.  So I'd suggest the replacement whenever R_IS_INT(n) is true.
>>>> 
>>> But choose() very deliberately ensures that k is integer, so choose(n, n-k) is ill-defined for non-integer n.
>> 
>> That's only true if there's a big difference.  I'd be worried about cases where n and k are close to integers (within 1e-7).  In those cases, k is silently rounded to integer.  As I read your suggestion, n would only be rounded to integer if k > n-k.  I think both n and k should be rounded to integer in this near-integer situation, regardless of the value of k.
>> 
>> I believe that lchoose(n, k) already does this.
>> 
>> Duncan Murdoch
>> 
>>>    double r, k0 = k;
>>>    k = R_forceint(k);
>>> ...
>>>    if (fabs(k - k0) > 1e-7)
>>>        MATHLIB_WARNING2(_("'k' (%.2f) must be integer, rounded to %.0f"), k0, k);
>>> 
>>>> Duncan Murdoch
>>>> 
>>>>> -pd
>>>>>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott <ESWRIGHT using pitt.edu> wrote:
>>>>>> 
>>>>>> This struck me as incorrect:
>>>>>> 
>>>>>>> choose(3.999999, 4)
>>>>>> [1] 0.9999979
>>>>>>> choose(3.9999999, 4)
>>>>>> [1] 0
>>>>>>> choose(4, 4)
>>>>>> [1] 1
>>>>>>> choose(4.0000001, 4)
>>>>>> [1] 4
>>>>>>> choose(4.000001, 4)
>>>>>> [1] 1.000002
>>>>>> 
>>>>>> Should base::choose(n, k) check whether n is within machine precision of k and return 1?
>>>>>> 
>>>>>> Thanks,
>>>>>> Erik
>>>>>> 
>>>>>> ***
>>>>>> sessionInfo()
>>>>>> R version 3.6.0 beta (2019-04-15 r76395)
>>>>>> Platform: x86_64-apple-darwin15.6.0 (64-bit)
>>>>>> Running under: macOS High Sierra 10.13.6
>>>>>> 
>>>>>> 	[[alternative HTML version deleted]]
>>>>>> 
>>>>>> ______________________________________________
>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
> 
> -- 
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd.mes using cbs.dk  Priv: PDalgd using gmail.com
> 
> 
> 
> 
> 
> 
> 
> 
> 

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes using cbs.dk  Priv: PDalgd using gmail.com



More information about the R-devel mailing list