[R] uniform integer RNG 0 to t inclusive
Duncan Murdoch
murdoch at stats.uwo.ca
Tue Sep 19 12:45:11 CEST 2006
On 9/19/2006 4:41 AM, Prof Brian Ripley wrote:
> On Tue, 19 Sep 2006, Sean O'Riordain wrote:
>
>> Hi Duncan,
>>
>> Thanks for that. In the light of what you've suggested, I'm now using
>> the following:
>>
>> # generate a random integer from 0 to t (inclusive)
>> if (t < 10000000) { # to avoid memory problems...
>> M <- sample(t, 1)
>> } else {
>> while (M > t) {
>> M <- as.integer(urand(1,min=0, max=t+1-.Machine$double.eps))
>> }
>> }
>
> sample(t, 1) is a sample from 1:t, not 0:t.
>
> You need
>
> sample(t+1, 1, replace=TRUE) - 1
>
> which works in all cases up to INT_MAX-1, and beyond that you need to
> worry about the resolution of the RNG (and to use floor not as.integer).
I wonder if it would be a worthwhile optimization to treat replace as
TRUE whenever size=1 is requested.
- It would be a very cheap test in the C code, and would make a large
difference to the size=1 run time when n is very large.
- On the other hand, using size=1 is usually not a very efficient way
to program anything, so anyone who does it might not notice the gain...
Duncan Murdoch
>
> There is no such thing as urand in base R ....
>
>> cheers and Thanks,
>> Sean
>>
>> On 18/09/06, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
>>> On 9/18/2006 3:37 AM, Sean O'Riordain wrote:
>>>> Good morning,
>>>>
>>>> I'm trying to concisely generate a single integer from 0 to n
>>>> inclusive, where n might be of the order of hundreds of millions.
>>>> This will however be used many times during the general procedure, so
>>>> it must be "reasonably efficient" in both memory and time... (at some
>>>> later stage in the development I hope to go vectorized)
>>>>
>>>> The examples I've found through searching RSiteSearch() relating to
>>>> generating random integers say to use : sample(0:n, 1)
>>>> However, when n is "large" this first generates a large sequence 0:n
>>>> before taking a sample of one... this computer doesn't have the memory
>>>> for that!
>>> You don't need to give the whole vector: just give n, and you'll get
>>> draws from 1:n. The man page is clear on this.
>>>
>>> So what you want is sample(n+1, 1) - 1. (Use "replace=TRUE" if you want
>>> a sample bigger than 1, or you'll get sampling without replacement.)
>>>> When I look at the documentation for runif(n, min, max) it states that
>>>> the generated numbers will be min <= x <= max. Note the "<= max"...
>>> Actually it says that's the range for the uniform density. It's silent
>>> on the range of the output. But it's good defensive programming to
>>> assume that it's possible to get the endpoints.
>>>
>>>> How do I generate an x such that the probability of being (the
>>>> integer) max is the same as any other integer from min (an integer) to
>>>> max-1 (an integer) inclusive... My attempt is:
>>>>
>>>> urand.int <- function(n,t) {
>>>> as.integer(runif(n,min=0, max=t+1-.Machine$double.eps))
>>>> }
>>>> # where I've included the parameter n to help testing...
>>> Because of rounding error, t+1-.Machine$double.eps might be exactly
>>> equal to t+1. I'd suggest using a rejection method if you need to use
>>> this approach: but sample() is better in the cases where as.integer()
>>> will work.
>>>
>>> Duncan Murdoch
>>>> is floor() "better" than as.integer?
>>>>
>>>> Is this correct? Is the probability of the integer t the same as the
>>>> integer 1 or 0 etc... I have done some rudimentary testing and this
>>>> appears to work, but power being what it is, I can't see how to
>>>> realistically test this hypothesis.
>>>>
>>>> Or is there a a better way of doing this?
>>>>
>>>> I'm trying to implement an algorithm which samples into an array,
>>>> hence the need for an integer - and yes I know about sample() thanks!
>>>> :-)
>>>>
>>>> { incidentally, I was surprised to note that the maximum value
>>>> returned by summary(integer_vector) is "pretty" and appears to be
>>>> rounded up to a "nice round number", and is not necessarily the same
>>>> as max(integer_vector) where the value is large, i.e. of the order of
>>>> say 50 million }
>>>>
>>>> Is version etc relevant? (I'll want to be portable)
>>>>> version _
>>>> platform i386-pc-mingw32
>>>> arch i386
>>>> os mingw32
>>>> system i386, mingw32
>>>> status
>>>> major 2
>>>> minor 3.1
>>>> year 2006
>>>> month 06
>>>> day 01
>>>> svn rev 38247
>>>> language R
>>>> version.string Version 2.3.1 (2006-06-01)
>>>>
>>>> Many thanks in advance for your help.
>>>> Sean O'Riordain
>>>> affiliation <- NULL
>>>>
>>>> ______________________________________________
>>>> R-help at stat.math.ethz.ch mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
More information about the R-help
mailing list