[R] About 'choose' function

(Ted Harding) Ted.Harding at nessie.mcc.ac.uk
Mon Nov 8 15:29:46 CET 2004


On 08-Nov-04 John wrote:
> Hello R-users,
> 
> When I didn't know about the internal 'choose'
> function, I made such function, 'my.choose' below. But
> when I used them instead of choose(6000,20), they
> didn't give me any answer.
> 
> What is the difference between 'choose', 'my.choose1',
> and 'my.choose2' below? That is, what is behind
> 'choose' function and what's the problem using 'prod'
> or 'gamma' function?
> 
> Thanks a lot.
> 
> John
> 
>##########
> 
>> choose(6000,20)
> [1] 1.455904e+57
>>
>> my.choose1 <- function(x,y) {
> prod(1:x)/(prod(1:y)*prod(1:(x-y))) }
>> my.choose1(6000,20)
> [1] NaN
>>
>> my.choose2 <- function(x,y) {
> gamma(x+1)/(gamma(y+1)*gamma(x-y+1)) }
>> my.choose2(6000,20)
> [1] NaN

Your problem arises because the first thing that gets computed
in  your functions is the factorial of a large number, which
goes out of range; you then compound this by computing the
factorial of another large number, and dividing one by the
other:

  > prod(1:6000)
  [1] Inf
  > prod(1:5980)
  [1] Inf
  > prod(1:6000)/prod(1:5980)
  [1] NaN

It doesn't matter whether you do this using 'prod' or 'gamma'.

Those of us who grew up in the old days when you had to do things
the hard way learned tricks like:

  my.choose3 <- function(x,y){
    if((x==y)||(y==0)) return(1);
    m <- min(y,x-y)
    prod((x:(x-m+1))/(m:1))
  }

i.e. implementing, using term-by-term ratios (which keeps the
numbers within bounds all the way)  either

  (x/y)*((x-1)/(y-1))*...*((x-y+2)/2)*((x-y+1)/1)

  = x*(x-1)*...*(x-y+1)/{y*(y-1)*...*2*1}

or

  (x/(x-y))*((x-1)/(x-y-1))*...*((y+2)/2)*((y+1)/1)

  = x*(x-1)*...*(y+1)/{(x-y)*(x-y-1)*...*2*1}

whichever gives the shorter product. (Actually, in those days
we did it with loops too, or even by turning a handle -- all
the more reason to keep it short!).

Anyway:

  > choose(6000,20)
  [1] 1.455904e+57
  > my.choose3(6000,20)
  [1] 1.455904e+57

(And, if you try it, you'll see that 'my.choose3' is pretty fast).

However, as written above 'my.choose3' doesn't like really large
arguments:

  > choose(60000000000,31)
  [1] 1.612899e+300
  > my.choose3(60000000000,31)
  Error in x:(x - m + 1) : argument too large in magnitude

because the result of ":" is integer. However, it works OK
in the following form:

  my.choose3<-function(x,y){
    if((x==y)||(y==0)) return(1);
    m <- min(y,x-y)
   prod(seq(x,(x-m+1),by=-1)/(seq(m,1,by=-1)))
  }

when

  my.choose3(60000000000,31)
  [1] 1.613121e+300

which has a slight difference (0.014% greater) from the result of
'choose'.

Given the method of computation, I might feel inclined to trust
'my.choose3' rather than 'choose', but I'm not at sure of this
without studying the internal code of 'choose', and would welcome
comments!

Best wishes to all,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861  [NB: New number!]
Date: 08-Nov-04                                       Time: 14:29:46
------------------------------ XFMail ------------------------------




More information about the R-help mailing list