[R-sig-hpc] Serializing global state from within a namespace to distribute to other workers

Murray Stokely murray at stokely.org
Fri Oct 29 19:39:02 CEST 2010


I have a package which implements a parallelapply function for my
environment but I'm having trouble moving the package into a NAMESPACE
because of how the function is serialized by save() including the
NAMESPACESXP of the package.

parallelapply <- function(x, FUN, ...) {
 environment(FUN) <- .GlobalEnv   # does not have intended effect
 assign(".GLOBAL.FUN", function(x) { FUN(x, ...) }, env=.GlobalEnv)
 environment(.GLOBAL.FUN) <- .GlobalEnv   # does not have intended effect
 save(list = ls(envir = .GlobalEnv, all.names = TRUE),
      file = "/tmp/.Rdata",
 # Here we distribute the .Rdata file to other workers that load it
and then run .GLOBAL.FUN(n)
 # this works great if parallelapply is in a package without a
NAMESPACE, but fails on loadNamespace otherwise.
}

This relatively simple distribution technique seems to work well in
practice except for this namespace issue.  Is there any way to more
completely strip out the package NAMESPACE from the environment of the
function to be applied in parallel such that it can be loaded on the
workers without requiring they also load the same package NAMESPACE
(which is problematic because of all the shared object code in the
package that the workers don't need to load, etc..)

Or at there other distribution techniques for getting master state to
the workers for embarrassingly parallel R scripts (e.g. monte carlo
methods in my case) that I should consider?

        - Murray


---------- Forwarded message ----------
From: Murray Stokely <murray at stokely.org>
Date: Wed, Oct 27, 2010 at 11:35 PM
Subject: Creating truly global variable within function within namespace
To: r-help at r-project.org


I am trying to create a function with a package with a NAMESPACE that
will save() some variables, distribute an Rdata file to another
computer, where it will be load()ed.  The problem is that this load()
tries to load the namespace of the package on the original computer
that created the .Rdata file, but this package need not be loaded on
the new computer.

This example function is in a package in a namespace :

create.global.function <- function(x, FUN, ...) {
 environment(FUN) <- .GlobalEnv
 assign(".GLOBAL.FUN", function(x) { FUN(x, ...) }, env=.GlobalEnv)
 environment(.GLOBAL.FUN) <- .GlobalEnv
 save(list = ls(envir = .GlobalEnv, all.names = TRUE),
      file = "/tmp/.Rdata",
      envir = .GlobalEnv)
}

And then if I quit my session and then try to load("/tmp/.Rdata")
without loading the package, it will try to loadNamespace from the
.Rdata file and then fail.

Is it possible to fully strip the namespace out of the .GLOBAL.FUN
before I save() it such that it can be loaded into other R instances
without trying to load the namespace?

Thanks for any pointers ..

            - Murray



More information about the R-sig-hpc mailing list