[R-sig-hpc] Serializing global state from within a namespace to distribute to other workers

Murray Stokely murray at stokely.org
Tue Nov 9 22:50:25 CET 2010


On Tue, Nov 9, 2010 at 12:50 PM, Roger Bivand <Roger.Bivand at nhh.no> wrote:
> This is only an oblique follow-up - are there any tools for finding out whether
> the serialisation of an object will cause the serialisation of its environment?
> I've looked around, for example in the codetools package, but do not see
> anything obvious. I've also been hit by objects being serialised (both for snow
> and even just for save() - which I think is the underlying mechanism here)
> ending up about two orders of magnitude larger than the object.size() reported.

The print() statement for functions will tell you if an environment is
associated with the function that will need to be serialized.

I ended up in gdb and sprinkling Rprintf's around serialize.c and
loadsave.c to try to understand this better.  In the end I changed the
assign statement to use substitute to get around the fact that a
NAMESPACE is associated with FUN no matter how hard I try to strip it
off :

parallelapply <- function(x, FUN, ...) {
 environment(FUN) <- .GlobalEnv   # does not have intended effect
 assign(".GLOBAL.FUN",
              eval.parent(substitute(function(y) { FUN(y, ...) })),
              env=.GlobalEnv)
 environment(.GLOBAL.FUN) <- .GlobalEnv   # does not have intended effect
 save(list = ls(envir = .GlobalEnv, all.names = TRUE),
      file = "/tmp/.Rdata",
 # Here we distribute the .Rdata file to other workers that load it
and then run .GLOBAL.FUN(n)
 # this works great if parallelapply is in a package without a
NAMESPACE, but fails on loadNamespace otherwise.
}

              - Murray



More information about the R-sig-hpc mailing list