[R] uniform integer RNG 0 to t inclusive

Tue Sep 19 12:45:11 CEST 2006

On 9/19/2006 4:41 AM, Prof Brian Ripley wrote:
> On Tue, 19 Sep 2006, Sean O'Riordain wrote:
> 
>> Hi Duncan,
>>
>> Thanks for that.  In the light of what you've suggested, I'm now using
>> the following:
>>
>>  # generate a random integer from 0 to t (inclusive)
>>  if (t < 10000000) { # to avoid memory problems...
>>    M <- sample(t, 1)
>>  } else {
>>    while (M > t) {
>>      M <- as.integer(urand(1,min=0, max=t+1-.Machine$double.eps))
>>    }
>>  }
> 
> sample(t, 1) is a sample from 1:t, not 0:t.
> 
> You need
> 
> sample(t+1, 1, replace=TRUE) - 1
> 
> which works in all cases up to INT_MAX-1, and beyond that you need to 
> worry about the resolution of the RNG (and to use floor not as.integer).

I wonder if it would be a worthwhile optimization to treat replace as 
TRUE whenever size=1 is requested.

  - It would be a very cheap test in the C code, and  would make a large 
difference to the size=1 run time when n is very large.

  - On the other hand, using size=1 is usually not a very efficient way 
to program anything, so anyone who does it might not notice the gain...

Duncan Murdoch

> 
> There is no such thing as urand in base R ....
> 
>> cheers and Thanks,
>> Sean
>>
>> On 18/09/06, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
>>> On 9/18/2006 3:37 AM, Sean O'Riordain wrote:
>>>> Good morning,
>>>>
>>>> I'm trying to concisely generate a single integer from 0 to n
>>>> inclusive, where n might be of the order of hundreds of millions.
>>>> This will however be used many times during the general procedure, so
>>>> it must be "reasonably efficient" in both memory and time... (at some
>>>> later stage in the development I hope to go vectorized)
>>>>
>>>> The examples I've found through searching RSiteSearch() relating to
>>>> generating random integers say to use : sample(0:n, 1)
>>>> However, when n is "large" this first generates a large sequence 0:n
>>>> before taking a sample of one... this computer doesn't have the memory
>>>> for that!
>>> You don't need to give the whole vector:  just give n, and you'll get
>>> draws from 1:n.  The man page is clear on this.
>>>
>>> So what you want is sample(n+1, 1) - 1.  (Use "replace=TRUE" if you want
>>> a sample bigger than 1, or you'll get sampling without replacement.)
>>>> When I look at the documentation for runif(n, min, max) it states that
>>>> the generated numbers will be min <= x <= max.  Note the "<= max"...
>>> Actually it says that's the range for the uniform density.  It's silent
>>> on the range of the output.  But it's good defensive programming to
>>> assume that it's possible to get the endpoints.
>>>
>>>> How do I generate an x such that the probability of being (the
>>>> integer) max is the same as any other integer from min (an integer) to
>>>> max-1 (an integer) inclusive... My attempt is:
>>>>
>>>> urand.int <- function(n,t) {
>>>>   as.integer(runif(n,min=0, max=t+1-.Machine$double.eps))
>>>> }
>>>> # where I've included the parameter n to help testing...
>>> Because of rounding error, t+1-.Machine$double.eps might be exactly
>>> equal to t+1.  I'd suggest using a rejection method if you need to use
>>> this approach:  but sample() is better in the cases where as.integer()
>>> will work.
>>>
>>> Duncan Murdoch
>>>> is floor() "better" than as.integer?
>>>>
>>>> Is this correct?  Is the probability of the integer t the same as the
>>>> integer 1 or 0 etc... I have done some rudimentary testing and this
>>>> appears to work, but power being what it is, I can't see how to
>>>> realistically test this hypothesis.
>>>>
>>>> Or is there a a better way of doing this?
>>>>
>>>> I'm trying to implement an algorithm which samples into an array,
>>>> hence the need for an integer - and yes I know about sample() thanks!
>>>> :-)
>>>>
>>>> { incidentally, I was surprised to note that the maximum value
>>>> returned by summary(integer_vector) is "pretty" and appears to be
>>>> rounded up to a "nice round number", and is not necessarily the same
>>>> as max(integer_vector) where the value is large, i.e. of the order of
>>>> say 50 million }
>>>>
>>>> Is version etc relevant? (I'll want to be portable)
>>>>> version               _
>>>> platform       i386-pc-mingw32
>>>> arch           i386
>>>> os             mingw32
>>>> system         i386, mingw32
>>>> status
>>>> major          2
>>>> minor          3.1
>>>> year           2006
>>>> month          06
>>>> day            01
>>>> svn rev        38247
>>>> language       R
>>>> version.string Version 2.3.1 (2006-06-01)
>>>>
>>>> Many thanks in advance for your help.
>>>> Sean O'Riordain
>>>> affiliation <- NULL
>>>>
>>>> ______________________________________________
>>>> R-help at stat.math.ethz.ch mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>