[Rd] [R] choose(n, k) as n approaches k
peter dalgaard
pd@|gd @end|ng |rom gm@||@com
Wed Jan 15 01:25:54 CET 2020
That crossed my mind too, but presumably someone designed choose() to handle the near-integer cases specially. Otherwise, we already have beta() -- you just need to remember what the connection is ;-).
I would expect that it has to do with the binomial and negative binomial distributions, but I can't offhand picture a calculation that leads to integer k, n plus/minus a tiny numerical error of the sort that one may encounter with, say, seq().
-pd
;-) choose(a,b) = 1/(beta(a-b+1,b+1)*(a+1)) or thereabouts
> On 14 Jan 2020, at 19:36 , John Mount <jmount using win-vector.com> wrote:
>
>
> At the risk of throwing oil on a fire. If we are talking about fractional values of choose() doesn't it make sense to look to the gamma function for the correct analytic continuation? In particular k<0 may not imply the function should evaluate to zero until we get k<=-1.
>
> Example:
>
> ``` r
> choose(5, 4)
> #> [1] 5
>
> gchoose <- function(n, k) {
> gamma(n+1)/(gamma(n+1-k) * gamma(k+1))
> }
>
> gchoose(5, 4)
> #> [1] 5
> gchoose(5, 0)
> #> [1] 1
> gchoose(5, -0.5)
> #> [1] 0.2351727
> ```
>
>> On Jan 14, 2020, at 10:20 AM, peter dalgaard <pdalgd using gmail.com> wrote:
>>
>> OK, I see what you mean. But in those cases, we don't get the catastrophic failures from the
>>
>> if (k < 0) return 0.;
>> if (k == 0) return 1.;
>> /* else: k >= 1 */
>>
>> part, because at that point k is sure to be integer, possibly after rounding.
>>
>> It is when n-k is approximately but not exactly zero and we should return 1, that we either return 0 (negative case) or n (positive case; because the n(n-1)(n-2)... product has at least one factor). In the other cases, we get 1 or n(n-1)(n-2)...(n-k+1) which if n is near-integer gets rounded to produce an integer, due to the
>>
>> return R_IS_INT(n) ? R_forceint(r) : r;
>>
>> part.
>>
>> -pd
>>
>>
>>
>>> On 14 Jan 2020, at 17:02 , Duncan Murdoch <murdoch.duncan using gmail.com> wrote:
>>>
>>> On 14/01/2020 10:50 a.m., peter dalgaard wrote:
>>>>> On 14 Jan 2020, at 16:21 , Duncan Murdoch <murdoch.duncan using gmail.com> wrote:
>>>>>
>>>>> On 14/01/2020 10:07 a.m., peter dalgaard wrote:
>>>>>> Yep, that looks wrong (probably want to continue discussion over on R-devel)
>>>>>> I think the culprit is here (in src/nmath/choose.c)
>>>>>> if (k < k_small_max) {
>>>>>> int j;
>>>>>> if(n-k < k && n >= 0 && R_IS_INT(n)) k = n-k; /* <- Symmetry */
>>>>>> if (k < 0) return 0.;
>>>>>> if (k == 0) return 1.;
>>>>>> /* else: k >= 1 */
>>>>>> if n is a near-integer, then k can become non-integer and negative. In your case,
>>>>>> n == 4 - 1e-7
>>>>>> k == 4
>>>>>> n - k == -1e-7 < 4
>>>>>> n >= 0
>>>>>> R_IS_INT(n) = TRUE (relative diff < 1e-7 is allowed)
>>>>>> so k gets set to
>>>>>> n - k == -1e-7
>>>>>> which is less than 0, so we return 0. However, as you point out, 1 would be more reasonable and in accordance with the limit as n -> 4, e.g.
>>>>>>> factorial(4 - 1e-10)/factorial(1e-10)/factorial(4) -1
>>>>>> [1] -9.289025e-11
>>>>>> I guess that the fix could be as simple as replacing n by R_forceint(n) in the k = n - k step.
>>>>>
>>>>> I think that would break symmetry: you want choose(n, k) to equal choose(n, n-k) when n is very close to an integer. So I'd suggest the replacement whenever R_IS_INT(n) is true.
>>>>>
>>>> But choose() very deliberately ensures that k is integer, so choose(n, n-k) is ill-defined for non-integer n.
>>>
>>> That's only true if there's a big difference. I'd be worried about cases where n and k are close to integers (within 1e-7). In those cases, k is silently rounded to integer. As I read your suggestion, n would only be rounded to integer if k > n-k. I think both n and k should be rounded to integer in this near-integer situation, regardless of the value of k.
>>>
>>> I believe that lchoose(n, k) already does this.
>>>
>>> Duncan Murdoch
>>>
>>>> double r, k0 = k;
>>>> k = R_forceint(k);
>>>> ...
>>>> if (fabs(k - k0) > 1e-7)
>>>> MATHLIB_WARNING2(_("'k' (%.2f) must be integer, rounded to %.0f"), k0, k);
>>>>
>>>>> Duncan Murdoch
>>>>>
>>>>>> -pd
>>>>>>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott <ESWRIGHT using pitt.edu> wrote:
>>>>>>>
>>>>>>> This struck me as incorrect:
>>>>>>>
>>>>>>>> choose(3.999999, 4)
>>>>>>> [1] 0.9999979
>>>>>>>> choose(3.9999999, 4)
>>>>>>> [1] 0
>>>>>>>> choose(4, 4)
>>>>>>> [1] 1
>>>>>>>> choose(4.0000001, 4)
>>>>>>> [1] 4
>>>>>>>> choose(4.000001, 4)
>>>>>>> [1] 1.000002
>>>>>>>
>>>>>>> Should base::choose(n, k) check whether n is within machine precision of k and return 1?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Erik
>>>>>>>
>>>>>>> ***
>>>>>>> sessionInfo()
>>>>>>> R version 3.6.0 beta (2019-04-15 r76395)
>>>>>>> Platform: x86_64-apple-darwin15.6.0 (64-bit)
>>>>>>> Running under: macOS High Sierra 10.13.6
>>>>>>>
>>>>>>> [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> --
>> Peter Dalgaard, Professor,
>> Center for Statistics, Copenhagen Business School
>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>> Phone: (+45)38153501
>> Office: A 4.23
>> Email: pd.mes using cbs.dk Priv: PDalgd using gmail.com
>>
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ---------------
> John Mount
> http://www.win-vector.com/
> Our book: Practical Data Science with R
> http://practicaldatascience.com
>
>
>
>
>
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes using cbs.dk Priv: PDalgd using gmail.com
More information about the R-devel
mailing list