[Bioc-devel] BiocParallel: Best standards for passing locally assigned variables/functions, e.g. a bpExport()?

Ryan rct at thompsonclan.org
Sun Nov 3 23:38:25 CET 2013


Here's an easy thing we can add to BiocParallel in the short term. The 
following code defines a wrapper function "withBPExtraErrorText" that 
simply appends an additional message to the end of any error that looks 
like it is about a missing variable. We could wrap every evaluation in 
a similar tryCatch to at least provide a more informative error message 
when a subprocess has a missing variable.

-Ryan

withBPExtraErrorText <- function(expr) {
    tryCatch({
        expr
    }, simpleError = function(err) {
        if (grepl("^object '(.*)' not found$", err$message, perl=TRUE)) 
{
            ## It is an error due to a variable not found.
            err$message <- paste0(err$message, ". Maybe you forgot to 
export this variable from the main R session using \"bpexport\"?")
        }
        stop(err)
    })
}

x <- 5

## Succeeds
withBPExtraErrorText(x)

## Fails with more informative error message
withBPExtraErrorText(y)



On Sun Nov  3 14:01:48 2013, Henrik Bengtsson wrote:
> On Sun, Nov 3, 2013 at 1:29 PM, Michael Lawrence
> <lawrence.michael at gene.com> wrote:
>> An analog to clusterExport is a good idea. To make it even easier, we could
>> have a dynamic environment based on object tables that would catch missing
>> symbols and download them from the parent thread. But maybe there's some
>> benefit to being explicit?
>
> A first step to fully automate this would be to provide some (opt
> in/out) mechanism for code inspection and warn about non-defined
> objects (cf. 'R CMD check').  That is of course major work, but will
> certainly spare the community/users 1000's of hours in troubleshooting
> and the mailing lists from "why doesn't my parallel code not work"
> messages.  Such protection may be better suited for the 'parallel'
> package though.  Unfortunately, it's beyond my skills/time to pull
> such a thing together.
>
> /Henrik
>
>>
>> Michael
>>
>>
>> On Sun, Nov 3, 2013 at 12:39 PM, Henrik Bengtsson <hb at biostat.ucsf.edu>
>> wrote:
>>>
>>> Hi,
>>>
>>> in BiocParallel, is there a suggested (or planned) best standards for
>>> making *locally* assigned variables (e.g. functions) available to the
>>> applied function when it runs in a separate R process (which will be
>>> the most common use case)?  I understand that avoid local variables
>>> should be avoided and it's preferred to put as mush as possible in
>>> packages, but that's not always possible or very convenient.
>>>
>>> EXAMPLE:
>>>
>>> library('BiocParallel')
>>> library('BatchJobs')
>>>
>>> # Here I pick a recursive functions to make the problem a bit harder, i.e.
>>> # the function needs to call itself ("itself" = see below)
>>> fib <- function(n=0) {
>>>    if (n < 0) stop("Invalid 'n': ", n)
>>>    if (n == 0 || n == 1) return(1)
>>>    fib(n-2) + fib(n-1)
>>> }
>>>
>>> # Executing in the current R session
>>> cluster.functions <- makeClusterFunctionsInteractive()
>>> bpParams <- BatchJobsParam(cluster.functions=cluster.functions)
>>> register(bpParams)
>>> values <- bplapply(0:9, FUN=fib)
>>> ## SubmitJobs |++++++++++++++++++++++++++++++++++| 100% (00:00:00)
>>> ## Waiting [S:0 R:0 D:10 E:0] |+++++++++++++++++++| 100% (00:00:00)
>>>
>>>
>>> # Executing in a separate R process, where fib() is not defined
>>> # (not specific to BiocParallel)
>>> cluster.functions <- makeClusterFunctionsLocal()
>>> bpParams <- BatchJobsParam(cluster.functions=cluster.functions)
>>> register(bpParams)
>>> values <- bplapply(0:9, FUN=fib)
>>> ## SubmitJobs |++++++++++++++++++++++++++++++++++| 100% (00:00:00)
>>> ## Waiting [S:0 R:0 D:10 E:0] |+++++++++++++++++++| 100% (00:00:00)
>>> Error in LastError$store(results = results, is.error = !ok, throw.error =
>>> TRUE)
>>> :
>>>    Errors occurred during execution. First error message:
>>> Error in FUN(...): could not find function "fib"
>>> [...]
>>>
>>>
>>> # The following illustrates that the solution is not always
>>> straightforward.
>>> # (not specific to BiocParallel; must have been discussed previously)
>>> values <- bplapply(0:9, FUN=function(n, fib) {
>>>    fib(n)
>>> }, fib=fib)
>>> Error in LastError$store(results = results, is.error = !ok,
>>> throw.error = TRUE) :
>>>    Errors occurred during execution. First error message:
>>> Error in fib(n): could not find function "fib"
>>> [...]
>>>
>>> # Workaround; make fib() aware of itself
>>> # (this is something the user need to do, and would be very
>>> #  hard for BiocParallel et al. to automate.  BTW, should all
>>> #  recursive functions be implemented this way?).
>>> fib <- function(n=0) {
>>>    if (n < 0) stop("Invalid 'n': ", n)
>>>    if (n == 0 || n == 1) return(1)
>>>    fib <- sys.function() # Make function aware of itself
>>>    fib(n-2) + fib(n-1)
>>> }
>>> values <- bplapply(0:9, FUN=function(n, fib) {
>>>    fib(n)
>>> }, fib=fib)
>>>
>>>
>>> WISHLIST:
>>> Considering the above recursive issue solved, a slightly more explicit
>>> and standardized solution is then:
>>>
>>> values <- bplapply(0:9, FUN=function(n, BPGLOBALS=NULL) {
>>>    for (name in names(BPGLOBALS)) assign(name, BPGLOBALS[[name]])
>>>    fib(n)
>>> }, BPGLOBALS=list(fib=fib))
>>>
>>> Could the above be generalized into something as neat as:
>>>
>>> bpExport("fib")
>>> values <- bplapply(0:9, FUN=function(n) {
>>>    BiocParallel::bpImport("fib")
>>>    fib(n)
>>> })
>>>
>>> or ideally just (analogously to parallel::clusterExport()):
>>>
>>> bpExport("fib")
>>> values <- bplapply(0:9, FUN=fib)
>>>
>>> /Henrik
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list