[Bioc-devel] BiocParallel: Best standards for passing locally assigned variables/functions, e.g. a bpExport()?

Henrik Bengtsson hb at biostat.ucsf.edu
Sun Nov 3 21:39:04 CET 2013


Hi,

in BiocParallel, is there a suggested (or planned) best standards for
making *locally* assigned variables (e.g. functions) available to the
applied function when it runs in a separate R process (which will be
the most common use case)?  I understand that avoid local variables
should be avoided and it's preferred to put as mush as possible in
packages, but that's not always possible or very convenient.

EXAMPLE:

library('BiocParallel')
library('BatchJobs')

# Here I pick a recursive functions to make the problem a bit harder, i.e.
# the function needs to call itself ("itself" = see below)
fib <- function(n=0) {
  if (n < 0) stop("Invalid 'n': ", n)
  if (n == 0 || n == 1) return(1)
  fib(n-2) + fib(n-1)
}

# Executing in the current R session
cluster.functions <- makeClusterFunctionsInteractive()
bpParams <- BatchJobsParam(cluster.functions=cluster.functions)
register(bpParams)
values <- bplapply(0:9, FUN=fib)
## SubmitJobs |++++++++++++++++++++++++++++++++++| 100% (00:00:00)
## Waiting [S:0 R:0 D:10 E:0] |+++++++++++++++++++| 100% (00:00:00)


# Executing in a separate R process, where fib() is not defined
# (not specific to BiocParallel)
cluster.functions <- makeClusterFunctionsLocal()
bpParams <- BatchJobsParam(cluster.functions=cluster.functions)
register(bpParams)
values <- bplapply(0:9, FUN=fib)
## SubmitJobs |++++++++++++++++++++++++++++++++++| 100% (00:00:00)
## Waiting [S:0 R:0 D:10 E:0] |+++++++++++++++++++| 100% (00:00:00)
Error in LastError$store(results = results, is.error = !ok, throw.error = TRUE)
:
  Errors occurred during execution. First error message:
Error in FUN(...): could not find function "fib"
[...]


# The following illustrates that the solution is not always straightforward.
# (not specific to BiocParallel; must have been discussed previously)
values <- bplapply(0:9, FUN=function(n, fib) {
  fib(n)
}, fib=fib)
Error in LastError$store(results = results, is.error = !ok,
throw.error = TRUE) :
  Errors occurred during execution. First error message:
Error in fib(n): could not find function "fib"
[...]

# Workaround; make fib() aware of itself
# (this is something the user need to do, and would be very
#  hard for BiocParallel et al. to automate.  BTW, should all
#  recursive functions be implemented this way?).
fib <- function(n=0) {
  if (n < 0) stop("Invalid 'n': ", n)
  if (n == 0 || n == 1) return(1)
  fib <- sys.function() # Make function aware of itself
  fib(n-2) + fib(n-1)
}
values <- bplapply(0:9, FUN=function(n, fib) {
  fib(n)
}, fib=fib)


WISHLIST:
Considering the above recursive issue solved, a slightly more explicit
and standardized solution is then:

values <- bplapply(0:9, FUN=function(n, BPGLOBALS=NULL) {
  for (name in names(BPGLOBALS)) assign(name, BPGLOBALS[[name]])
  fib(n)
}, BPGLOBALS=list(fib=fib))

Could the above be generalized into something as neat as:

bpExport("fib")
values <- bplapply(0:9, FUN=function(n) {
  BiocParallel::bpImport("fib")
  fib(n)
})

or ideally just (analogously to parallel::clusterExport()):

bpExport("fib")
values <- bplapply(0:9, FUN=fib)

/Henrik



More information about the Bioc-devel mailing list