[Bioc-devel] BiocParallel: Best standards for passing locally assigned variables/functions, e.g. a bpExport()?
Ryan
rct at thompsonclan.org
Sun Nov 3 23:38:25 CET 2013
Here's an easy thing we can add to BiocParallel in the short term. The
following code defines a wrapper function "withBPExtraErrorText" that
simply appends an additional message to the end of any error that looks
like it is about a missing variable. We could wrap every evaluation in
a similar tryCatch to at least provide a more informative error message
when a subprocess has a missing variable.
-Ryan
withBPExtraErrorText <- function(expr) {
tryCatch({
expr
}, simpleError = function(err) {
if (grepl("^object '(.*)' not found$", err$message, perl=TRUE))
{
## It is an error due to a variable not found.
err$message <- paste0(err$message, ". Maybe you forgot to
export this variable from the main R session using \"bpexport\"?")
}
stop(err)
})
}
x <- 5
## Succeeds
withBPExtraErrorText(x)
## Fails with more informative error message
withBPExtraErrorText(y)
On Sun Nov 3 14:01:48 2013, Henrik Bengtsson wrote:
> On Sun, Nov 3, 2013 at 1:29 PM, Michael Lawrence
> <lawrence.michael at gene.com> wrote:
>> An analog to clusterExport is a good idea. To make it even easier, we could
>> have a dynamic environment based on object tables that would catch missing
>> symbols and download them from the parent thread. But maybe there's some
>> benefit to being explicit?
>
> A first step to fully automate this would be to provide some (opt
> in/out) mechanism for code inspection and warn about non-defined
> objects (cf. 'R CMD check'). That is of course major work, but will
> certainly spare the community/users 1000's of hours in troubleshooting
> and the mailing lists from "why doesn't my parallel code not work"
> messages. Such protection may be better suited for the 'parallel'
> package though. Unfortunately, it's beyond my skills/time to pull
> such a thing together.
>
> /Henrik
>
>>
>> Michael
>>
>>
>> On Sun, Nov 3, 2013 at 12:39 PM, Henrik Bengtsson <hb at biostat.ucsf.edu>
>> wrote:
>>>
>>> Hi,
>>>
>>> in BiocParallel, is there a suggested (or planned) best standards for
>>> making *locally* assigned variables (e.g. functions) available to the
>>> applied function when it runs in a separate R process (which will be
>>> the most common use case)? I understand that avoid local variables
>>> should be avoided and it's preferred to put as mush as possible in
>>> packages, but that's not always possible or very convenient.
>>>
>>> EXAMPLE:
>>>
>>> library('BiocParallel')
>>> library('BatchJobs')
>>>
>>> # Here I pick a recursive functions to make the problem a bit harder, i.e.
>>> # the function needs to call itself ("itself" = see below)
>>> fib <- function(n=0) {
>>> if (n < 0) stop("Invalid 'n': ", n)
>>> if (n == 0 || n == 1) return(1)
>>> fib(n-2) + fib(n-1)
>>> }
>>>
>>> # Executing in the current R session
>>> cluster.functions <- makeClusterFunctionsInteractive()
>>> bpParams <- BatchJobsParam(cluster.functions=cluster.functions)
>>> register(bpParams)
>>> values <- bplapply(0:9, FUN=fib)
>>> ## SubmitJobs |++++++++++++++++++++++++++++++++++| 100% (00:00:00)
>>> ## Waiting [S:0 R:0 D:10 E:0] |+++++++++++++++++++| 100% (00:00:00)
>>>
>>>
>>> # Executing in a separate R process, where fib() is not defined
>>> # (not specific to BiocParallel)
>>> cluster.functions <- makeClusterFunctionsLocal()
>>> bpParams <- BatchJobsParam(cluster.functions=cluster.functions)
>>> register(bpParams)
>>> values <- bplapply(0:9, FUN=fib)
>>> ## SubmitJobs |++++++++++++++++++++++++++++++++++| 100% (00:00:00)
>>> ## Waiting [S:0 R:0 D:10 E:0] |+++++++++++++++++++| 100% (00:00:00)
>>> Error in LastError$store(results = results, is.error = !ok, throw.error =
>>> TRUE)
>>> :
>>> Errors occurred during execution. First error message:
>>> Error in FUN(...): could not find function "fib"
>>> [...]
>>>
>>>
>>> # The following illustrates that the solution is not always
>>> straightforward.
>>> # (not specific to BiocParallel; must have been discussed previously)
>>> values <- bplapply(0:9, FUN=function(n, fib) {
>>> fib(n)
>>> }, fib=fib)
>>> Error in LastError$store(results = results, is.error = !ok,
>>> throw.error = TRUE) :
>>> Errors occurred during execution. First error message:
>>> Error in fib(n): could not find function "fib"
>>> [...]
>>>
>>> # Workaround; make fib() aware of itself
>>> # (this is something the user need to do, and would be very
>>> # hard for BiocParallel et al. to automate. BTW, should all
>>> # recursive functions be implemented this way?).
>>> fib <- function(n=0) {
>>> if (n < 0) stop("Invalid 'n': ", n)
>>> if (n == 0 || n == 1) return(1)
>>> fib <- sys.function() # Make function aware of itself
>>> fib(n-2) + fib(n-1)
>>> }
>>> values <- bplapply(0:9, FUN=function(n, fib) {
>>> fib(n)
>>> }, fib=fib)
>>>
>>>
>>> WISHLIST:
>>> Considering the above recursive issue solved, a slightly more explicit
>>> and standardized solution is then:
>>>
>>> values <- bplapply(0:9, FUN=function(n, BPGLOBALS=NULL) {
>>> for (name in names(BPGLOBALS)) assign(name, BPGLOBALS[[name]])
>>> fib(n)
>>> }, BPGLOBALS=list(fib=fib))
>>>
>>> Could the above be generalized into something as neat as:
>>>
>>> bpExport("fib")
>>> values <- bplapply(0:9, FUN=function(n) {
>>> BiocParallel::bpImport("fib")
>>> fib(n)
>>> })
>>>
>>> or ideally just (analogously to parallel::clusterExport()):
>>>
>>> bpExport("fib")
>>> values <- bplapply(0:9, FUN=fib)
>>>
>>> /Henrik
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
More information about the Bioc-devel
mailing list