[Rd] [External] setting .libPaths() with parallel::clusterCall

Mark van der Loo m@rk@v@nder|oo @end|ng |rom gm@||@com
Wed Dec 23 10:05:08 CET 2020


Dear Luke,

Thank you, this makes perfect sense.

I find it quite hard to express this issue in a way that is both compact
and understandable.
In any case, below you find a proposal for an update of the documentation.

Thank you again for all your work,
Mark



Index: src/library/parallel/man/clusterApply.Rd
===================================================================
--- src/library/parallel/man/clusterApply.Rd (revision 79673)
+++ src/library/parallel/man/clusterApply.Rd (working copy)
@@ -136,6 +136,15 @@
   more efficient than \code{parApply} but do less post-processing of the
   result.

+  Functions with a \code{fun} or \code{FUN} parameter send a serialized
+  copy of the argument from the main process to each worker node.
+  When the argument passed to \code{fun} or \code{FUN} is a function
+  this is equivalent to calling the same function on the worker node,
+  except when the function has an enclosing environment it modifies.
+  A notable example is \code{\link{.libPaths}}. To ensure that the
+  function local to each worker is called so it modifies its local
+  enclosing environment, pass the name of the function as a string.
+
   A chunk size of \code{0} with static scheduling uses the default (one
   chunk per node).  With dynamic scheduling, chunk size of \code{0} has the
   same effect as \code{1} (one invocation of \code{FUN}/\code{fun} per










On Tue, Dec 22, 2020 at 2:37 PM <luke-tierney using uiowa.edu> wrote:

> On Tue, 22 Dec 2020, Mark van der Loo wrote:
>
> > Dear all,
> >
> > It is not possible to set library paths on worker nodes with
> > parallel::clusterCall (or snow::clusterCall) and I wonder if this is
> > intended behavior.
> >
> > Example.
> >
> > library(parallel)
> > libdir <- "./tmplib"
> > if (!dir.exists(libdir)) dir.create("./tmplib")
> >
> > cl <- makeCluster(2)
> > clusterCall(cl, .libPaths, c(libdir, .libPaths()) )
> >
> > The output is as expected with the extra libdir returned for each worker
> > node. However, running
> >
> > clusterEvalQ(cl, .libPaths())
> >
> > Shows that the library paths have not been set.
>
> Use this:
>
>      clusterCall(cl, ".libPaths", c(libdir, .libPaths()) )
>
> This will find the function .libPaths on the workers.
>
> Your clusterCall sends across a serialized copy of your process'
> .libPaths and calls that. Usually that is equivalent to calling the
> function found by the name you used on the workers, but not when the
> function has an enclosing environment that the function modifies by
> assignment.
>
> Alternate implementations of .libPaths that are more
> serialization-friendly are possible in principle but probably not
> practical given limitations of the base package.
>
> The distinction between providing a function value or a character
> string as the function argument to clusterCall and others could
> probably use a paragraph in the help file; happy to consider a patch
> if anyone wants to take a crack at it.
>
> Best,
>
> luke
>
> >
> > If this is indeed a bug, I'm happy to file it at bugzilla. Tested on R
> > 4.0.3 and r-devel.
> >
> > Best,
> > Mark
> > ps: a workaround is documented here:
> >
> https://www.markvanderloo.eu/yaRb/2020/12/17/how-to-set-library-path-on-a-parallel-r-cluster/
> >
> >
> >> sessionInfo()
> > R Under development (unstable) (2020-12-21 r79668)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> > Running under: Ubuntu 20.04.1 LTS
> >
> > Matrix products: default
> > BLAS:   /home/mark/projects/Rdev/R-devel/lib/libRblas.so
> > LAPACK: /home/mark/projects/Rdev/R-devel/lib/libRlapack.so
> >
> > locale:
> > [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> > [3] LC_TIME=nl_NL.UTF-8        LC_COLLATE=en_US.UTF-8
> > [5] LC_MONETARY=nl_NL.UTF-8    LC_MESSAGES=en_US.UTF-8
> > [7] LC_PAPER=nl_NL.UTF-8       LC_NAME=C
> > [9] LC_ADDRESS=C               LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] parallel  stats     graphics  grDevices utils     datasets  methods
> > [8] base
> >
> > loaded via a namespace (and not attached):
> > [1] compiler_4.1.0
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa                  Phone:             319-335-3386
> Department of Statistics and        Fax:               319-335-3017
>     Actuarial Science
> 241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list