[Rd] document environment passing in parallel::parLapply

Mon Dec 11 18:45:33 CET 2017

The runtime of parallel::parLapply depends on variables unrelated to
the parLapply call. However, this is not clearly documented. Therefore
I would like to suggest expanding the relevant documentation to
explain this behaviour.

Consider this example:

parallel_demo <- function(random_values_count) {
  some_data <- runif(random_values_count)
  dummy_function <- function(x) {
    x
  }

  cluster <- parallel::makeCluster(3)
  start <- proc.time()

  parallel::parLapply(cluster, 1:3, dummy_function)

  runtime <- proc.time() - start
  parallel::stopCluster(cluster)
  runtime
}
parallel_demo(10)
parallel_demo(100 * 1000 * 1000)

On my machine, this results in a measured runtime of 0.01 seconds
being returned for the first call to parallel_demo, but in a runtime
of 7.04 seconds being returned for the second call.

I could not find clear documentation in either ?parallel::parLapply or
vignette("parallel", package = "parallel") - or any other obvious
place - on what is the reason for the demonstrated difference in
runtime.

Based on the observations described above (and on lots of additional
tests), my _assumption_ is that parallel::parLapply passes the whole
environment of its "fun" argument to all cluster nodes, which of
course takes some time. Thus the more data there is in this
environment, the longer this takes, even though the environment data
might not be needed to execute the function "fun".

For environments with lots of data in them, this can considerably slow
down the computation at hand. At the same time, this behaviour of
passing all data in the environment of "fun" to the cluster nodes is
not clearly documented. The only - rather vague - hint that I found
about this is in the "extended examples" section (specifically on page
13, in section 10.4) of vignette("parallel", package = "parallel").
Furthermore, this behaviour is not something that would very easily be
expected by every R user, in my opinion. Therefore I want to suggested
expanding the documentation of parallel::parLapply so that it
explicitely states that the environment of "fun" has to be passed to
all cluster nodes, which may take some time.
I spent a considerable amount of time on figuring out why my
parallelization code didn't really speed up my calculations, and I
would like to save others from going through this hassle again. :-)

For the sake of completeness, here is my session info:

> version
               _
platform       x86_64-w64-mingw32
arch           x86_64
os             mingw32
system         x86_64, mingw32
status
major          3
minor          4.3
year           2017
month          11
day            30
svn rev        73796
language       R
version.string R version 3.4.3 (2017-11-30)
nickname       Kite-Eating Tree
> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252
LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C                    LC_TIME=German_Germany.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.4.3 parallel_3.4.3 tools_3.4.3    yaml_2.1.14

Martin