[Rd] 'parallel' package changes '.Random.seed'

Henric Winell nilsson.henric at gmail.com
Thu Mar 6 12:54:25 CET 2014


Comments below.

On 2014-03-06 11:17, Henric Winell wrote:
> Hi,
>
> I've implemented parallelization in one of my packages using the
> 'parallel' package -- many thanks for providing it!
>
> In my package I'm importing 'parallel' and so added it to the
> DESCRIPTION file's 'Import:' tag and also added a
> 'importFrom("parallel", ...)' statement in the NAMESPACE file.
>
> Parallelization works nicely, but my package no longer passes any parts
> of its (unparallelized) checks that depends on random number generation,
> e.g., the simulated data in the check suite are no longer the same as
> before parallelization was added.  This seems to be due to 'parallel'
> changing '.Random.seed' when loading its name space:
>
>  > set.seed(1)
>  > rs1 <- .Random.seed
>  > rnorm(1)
> [1] -0.6264538
>  > set.seed(1)
>  > rs2 <- .Random.seed
>  > identical(rs1, rs2)
> [1] TRUE
>  > loadNamespace("parallel")
> <environment: namespace:parallel>
>  > rs3 <- .Random.seed
>  > identical(rs1, rs3)
> [1] FALSE
>  > rnorm(1)
> [1] -0.3262334
>  > set.seed(1)
>  > rs4 <- .Random.seed
>  > identical(rs1, rs4)
> [1] TRUE
>
> I've taken a look at the 'parallel' source code, and in a few places a
> call to 'runif(1)' is issued.  So, what effectively seems to happen when
> 'parallel' is loaded is
>
>  > set.seed(1)
>  > runif(1)
> [1] 0.2655087
>  > rnorm(1)
> [1] -0.3262334

Some digging reveals that this is due to no port number for the socket 
connection being set by default, in which case 'parallel' picks a random 
port in the 11000-11999 range using 'runif(1L)'.  So, by setting 
R_PARALLEL_PORT the '.Random.seed' object is no longer touched:

 > Sys.setenv(R_PARALLEL_PORT = 11500)
 > set.seed(1)
 > rs1 <- .Random.seed
 > loadNamespace("parallel")
<environment: namespace:parallel>
 > rs2 <- .Random.seed
 > identical(rs1, rs2)
[1] TRUE

This is handled in the 'initDefaultClusterOptions' function in 'snow.R', 
where line 88 has

port <- 11000 + 1000 * ((stats::runif(1L) + unclass(Sys.time())/300)%%1)

It seems to me that we can tread more carefully here.  I've attached a 
trivial patch that

1. Checks if '.Random.seed' exists
2. If TRUE:  a) save '.Random.seed'
              b) make the call above
              c) reset '.Random.seed' to its state in a)
    If FALSE: a) make the call above
              b) remove '.Random.seed'

In due course I hope someone is interested enough to review it.


Henric Winell



>
> which reproduces the above.  But is this really necessary?  And more
> importantly (at least to me):  Can it somehow be avoided?
>
> The current state of affairs is a bit unfortunate, since it implies that
> a user just by loading the new parallelized version of my package can no
> longer reproduce any subsequent results depending on random number
> generation (unless a call to 'set.seed' was issued *after* attaching my
> package).
>
> I'd be most grateful for any help that you're able to provide here. Many
> thanks!
>
> Kind regards,
> Henric Winell
>
>
>> sessionInfo()
> R Under development (unstable) (2014-01-26 r64897)
> Platform: x86_64-redhat-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=sv_SE.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] compiler_3.1.0 parallel_3.1.0 tools_3.1.0
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: snow.R.patch
Type: text/x-patch
Size: 1138 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20140306/e83ca0fc/attachment.bin>


More information about the R-devel mailing list