[R] uniform integer RNG 0 to t inclusive

Sean O'Riordain seanpor at acm.org
Thu Sep 21 09:32:15 CEST 2006


Prof Ripley,
You are absolutely correct, this code will not work at all - for
starters, M isn't correctly initialized, etc.  I edited my code in
tinn-r and never ran it before posting... my apologies, I always seem
to be too quick off the mark to reply - despite the 4*runif(1)
suggestion in the posting guide...

I hadn't realized before what significant difference the replace=TRUE
would make to the runtime... Now I can just use the sample() code you
suggested and remove my runif() code altogether, as  sample(t+1, 1,
replace=TRUE) - 1 will work fine with t <- 2e9 which is considerably
more that I need.

Thanks again to both Prof Ripley and Duncan Murdoch,
Sean O'Riordain
affiliation <- NULL


On 19/09/06, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
> On Tue, 19 Sep 2006, Sean O'Riordain wrote:
>
> > Hi Duncan,
> >
> > Thanks for that.  In the light of what you've suggested, I'm now using
> > the following:
> >
> >  # generate a random integer from 0 to t (inclusive)
> >  if (t < 10000000) { # to avoid memory problems...
> >    M <- sample(t, 1)
> >  } else {
> >    while (M > t) {
> >      M <- as.integer(urand(1,min=0, max=t+1-.Machine$double.eps))
> >    }
> >  }
>
> sample(t, 1) is a sample from 1:t, not 0:t.
>
> You need
>
> sample(t+1, 1, replace=TRUE) - 1
>
> which works in all cases up to INT_MAX-1, and beyond that you need to
> worry about the resolution of the RNG (and to use floor not as.integer).
>
> There is no such thing as urand in base R ....
>
> >
> > cheers and Thanks,
> > Sean
> >
> > On 18/09/06, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
> >> On 9/18/2006 3:37 AM, Sean O'Riordain wrote:
> >>> Good morning,
> >>>
> >>> I'm trying to concisely generate a single integer from 0 to n
> >>> inclusive, where n might be of the order of hundreds of millions.
> >>> This will however be used many times during the general procedure, so
> >>> it must be "reasonably efficient" in both memory and time... (at some
> >>> later stage in the development I hope to go vectorized)
> >>>
> >>> The examples I've found through searching RSiteSearch() relating to
> >>> generating random integers say to use : sample(0:n, 1)
> >>> However, when n is "large" this first generates a large sequence 0:n
> >>> before taking a sample of one... this computer doesn't have the memory
> >>> for that!
> >>
> >> You don't need to give the whole vector:  just give n, and you'll get
> >> draws from 1:n.  The man page is clear on this.
> >>
> >> So what you want is sample(n+1, 1) - 1.  (Use "replace=TRUE" if you want
> >> a sample bigger than 1, or you'll get sampling without replacement.)
> >>>
> >>> When I look at the documentation for runif(n, min, max) it states that
> >>> the generated numbers will be min <= x <= max.  Note the "<= max"...
> >>
> >> Actually it says that's the range for the uniform density.  It's silent
> >> on the range of the output.  But it's good defensive programming to
> >> assume that it's possible to get the endpoints.
> >>
> >>>
> >>> How do I generate an x such that the probability of being (the
> >>> integer) max is the same as any other integer from min (an integer) to
> >>> max-1 (an integer) inclusive... My attempt is:
> >>>
> >>> urand.int <- function(n,t) {
> >>>   as.integer(runif(n,min=0, max=t+1-.Machine$double.eps))
> >>> }
> >>> # where I've included the parameter n to help testing...
> >>
> >> Because of rounding error, t+1-.Machine$double.eps might be exactly
> >> equal to t+1.  I'd suggest using a rejection method if you need to use
> >> this approach:  but sample() is better in the cases where as.integer()
> >> will work.
> >>
> >> Duncan Murdoch
> >>>
> >>> is floor() "better" than as.integer?
> >>>
> >>> Is this correct?  Is the probability of the integer t the same as the
> >>> integer 1 or 0 etc... I have done some rudimentary testing and this
> >>> appears to work, but power being what it is, I can't see how to
> >>> realistically test this hypothesis.
> >>>
> >>> Or is there a a better way of doing this?
> >>>
> >>> I'm trying to implement an algorithm which samples into an array,
> >>> hence the need for an integer - and yes I know about sample() thanks!
> >>> :-)
> >>>
> >>> { incidentally, I was surprised to note that the maximum value
> >>> returned by summary(integer_vector) is "pretty" and appears to be
> >>> rounded up to a "nice round number", and is not necessarily the same
> >>> as max(integer_vector) where the value is large, i.e. of the order of
> >>> say 50 million }
> >>>
> >>> Is version etc relevant? (I'll want to be portable)
> >>>> version               _
> >>> platform       i386-pc-mingw32
> >>> arch           i386
> >>> os             mingw32
> >>> system         i386, mingw32
> >>> status
> >>> major          2
> >>> minor          3.1
> >>> year           2006
> >>> month          06
> >>> day            01
> >>> svn rev        38247
> >>> language       R
> >>> version.string Version 2.3.1 (2006-06-01)
> >>>
> >>> Many thanks in advance for your help.
> >>> Sean O'Riordain
> >>> affiliation <- NULL
> >>>
> >>> ______________________________________________
> >>> R-help at stat.math.ethz.ch mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> --
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>



More information about the R-help mailing list