[Bioc-devel] BiocParallel: Best standards for passing locally assigned variables/functions, e.g. a bpExport()?
Henrik Bengtsson
hb at biostat.ucsf.edu
Sun Nov 3 21:39:04 CET 2013
Hi,
in BiocParallel, is there a suggested (or planned) best standards for
making *locally* assigned variables (e.g. functions) available to the
applied function when it runs in a separate R process (which will be
the most common use case)? I understand that avoid local variables
should be avoided and it's preferred to put as mush as possible in
packages, but that's not always possible or very convenient.
EXAMPLE:
library('BiocParallel')
library('BatchJobs')
# Here I pick a recursive functions to make the problem a bit harder, i.e.
# the function needs to call itself ("itself" = see below)
fib <- function(n=0) {
if (n < 0) stop("Invalid 'n': ", n)
if (n == 0 || n == 1) return(1)
fib(n-2) + fib(n-1)
}
# Executing in the current R session
cluster.functions <- makeClusterFunctionsInteractive()
bpParams <- BatchJobsParam(cluster.functions=cluster.functions)
register(bpParams)
values <- bplapply(0:9, FUN=fib)
## SubmitJobs |++++++++++++++++++++++++++++++++++| 100% (00:00:00)
## Waiting [S:0 R:0 D:10 E:0] |+++++++++++++++++++| 100% (00:00:00)
# Executing in a separate R process, where fib() is not defined
# (not specific to BiocParallel)
cluster.functions <- makeClusterFunctionsLocal()
bpParams <- BatchJobsParam(cluster.functions=cluster.functions)
register(bpParams)
values <- bplapply(0:9, FUN=fib)
## SubmitJobs |++++++++++++++++++++++++++++++++++| 100% (00:00:00)
## Waiting [S:0 R:0 D:10 E:0] |+++++++++++++++++++| 100% (00:00:00)
Error in LastError$store(results = results, is.error = !ok, throw.error = TRUE)
:
Errors occurred during execution. First error message:
Error in FUN(...): could not find function "fib"
[...]
# The following illustrates that the solution is not always straightforward.
# (not specific to BiocParallel; must have been discussed previously)
values <- bplapply(0:9, FUN=function(n, fib) {
fib(n)
}, fib=fib)
Error in LastError$store(results = results, is.error = !ok,
throw.error = TRUE) :
Errors occurred during execution. First error message:
Error in fib(n): could not find function "fib"
[...]
# Workaround; make fib() aware of itself
# (this is something the user need to do, and would be very
# hard for BiocParallel et al. to automate. BTW, should all
# recursive functions be implemented this way?).
fib <- function(n=0) {
if (n < 0) stop("Invalid 'n': ", n)
if (n == 0 || n == 1) return(1)
fib <- sys.function() # Make function aware of itself
fib(n-2) + fib(n-1)
}
values <- bplapply(0:9, FUN=function(n, fib) {
fib(n)
}, fib=fib)
WISHLIST:
Considering the above recursive issue solved, a slightly more explicit
and standardized solution is then:
values <- bplapply(0:9, FUN=function(n, BPGLOBALS=NULL) {
for (name in names(BPGLOBALS)) assign(name, BPGLOBALS[[name]])
fib(n)
}, BPGLOBALS=list(fib=fib))
Could the above be generalized into something as neat as:
bpExport("fib")
values <- bplapply(0:9, FUN=function(n) {
BiocParallel::bpImport("fib")
fib(n)
})
or ideally just (analogously to parallel::clusterExport()):
bpExport("fib")
values <- bplapply(0:9, FUN=fib)
/Henrik
More information about the Bioc-devel
mailing list