[Rd] Change in the RNG implementation?

Duncan Murdoch murdoch.duncan at gmail.com
Sat Oct 20 01:26:39 CEST 2012


On 12-10-19 7:04 PM, Hervé Pagès wrote:
> Hi,
>
> Looks like the implementation of random number generation changed in
> R-devel with respect to R-2.15.1.
>
> With R-2.15.1:
>
>     > set.seed(33)
>     > sample(49821115, 10)
>      [1] 22217252 19661919 24099911 45779422 42043111 25774933 21778053
> 17098516
>      [9]   773073  5878451
>
> With recent R-devel:
>
>     > set.seed(33)
>     > sample(49821115, 10)
>      [1] 22217252 19661919 24099912 45779425 42043115 25774935 21778056
> 17098518
>      [9]   773073  5878452
>
> This is on a 64-bit Ubuntu system.
>
> Is this change intended? I didn't see anything in the NEWS file.
>
> A potential problem with this is that it will break unit tests
> for algorithms that make use of RNG.
>
> Another more practical problem (at least for me) is the following:
> Bioconductor package maintainers are sometimes working hard on the
> development version of their package to improve the performance of
> some key functions. Comparing performance between BioC release
> (based on R-2.15) and devel (based on R-devel) often requires big
> input data that is randomly generated, because it's easiest than
> working with real data. Typically a small script is written that
> takes care of loading the required packages, generating the input
> data, and running a simple analysis. The same script is sourced in
> R-2.15 and R-devel, and performance and results are compared.
>
> Not being able to generate exactly the same input in the script is
> a problem. It can be worked around by generating the input once,
> serializing it, and use load() in the script, but that makes things
> more complicated and the script is not a standalone script anymore
> (cannot be passed around without also passing around the big .rda
> file).
>
> Thanks,
> H.
>

I think it was mentioned in the NEWS:

  \code{sample.int()} has some support for  \eqn{n \ge
  2^{31}}{n >= 2^31}: see its help for the limitations.

  A different algorithm is used for \code{(n, size, replace = FALSE,
  prob = NULL)} for \code{n > 1e7} and \code{size <= n/2}.  This
  is much faster and uses less memory, but does give different results.

I don't think the old algorithm is available, but perhaps it could be 
made available by an optional parameter.

Duncan Murdoch



More information about the R-devel mailing list