[Bioc-devel] BiocParallel: Best standards for passing locally assigned variables/functions, e.g. a bpExport()?

Ryan rct at thompsonclan.org
Mon Nov 4 00:04:40 CET 2013


Another potential easy step we can do is that if FUN function in the 
user's workspace, we automatically export that function under the same 
name in the children. This would make recursive functions just work, but 
it might be a bit too magical.

On 11/3/13, 2:38 PM, Ryan wrote:
> Here's an easy thing we can add to BiocParallel in the short term. The 
> following code defines a wrapper function "withBPExtraErrorText" that 
> simply appends an additional message to the end of any error that 
> looks like it is about a missing variable. We could wrap every 
> evaluation in a similar tryCatch to at least provide a more 
> informative error message when a subprocess has a missing variable.
>
> -Ryan
>
> withBPExtraErrorText <- function(expr) {
>    tryCatch({
>        expr
>    }, simpleError = function(err) {
>        if (grepl("^object '(.*)' not found$", err$message, perl=TRUE)) {
>            ## It is an error due to a variable not found.
>            err$message <- paste0(err$message, ". Maybe you forgot to 
> export this variable from the main R session using \"bpexport\"?")
>        }
>        stop(err)
>    })
> }
>
> x <- 5
>
> ## Succeeds
> withBPExtraErrorText(x)
>
> ## Fails with more informative error message
> withBPExtraErrorText(y)
>
>
>
> On Sun Nov  3 14:01:48 2013, Henrik Bengtsson wrote:
>> On Sun, Nov 3, 2013 at 1:29 PM, Michael Lawrence
>> <lawrence.michael at gene.com> wrote:
>>> An analog to clusterExport is a good idea. To make it even easier, 
>>> we could
>>> have a dynamic environment based on object tables that would catch 
>>> missing
>>> symbols and download them from the parent thread. But maybe there's 
>>> some
>>> benefit to being explicit?
>>
>> A first step to fully automate this would be to provide some (opt
>> in/out) mechanism for code inspection and warn about non-defined
>> objects (cf. 'R CMD check').  That is of course major work, but will
>> certainly spare the community/users 1000's of hours in troubleshooting
>> and the mailing lists from "why doesn't my parallel code not work"
>> messages.  Such protection may be better suited for the 'parallel'
>> package though.  Unfortunately, it's beyond my skills/time to pull
>> such a thing together.
>>
>> /Henrik
>>
>>>
>>> Michael
>>>
>>>
>>> On Sun, Nov 3, 2013 at 12:39 PM, Henrik Bengtsson <hb at biostat.ucsf.edu>
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> in BiocParallel, is there a suggested (or planned) best standards for
>>>> making *locally* assigned variables (e.g. functions) available to the
>>>> applied function when it runs in a separate R process (which will be
>>>> the most common use case)?  I understand that avoid local variables
>>>> should be avoided and it's preferred to put as mush as possible in
>>>> packages, but that's not always possible or very convenient.
>>>>
>>>> EXAMPLE:
>>>>
>>>> library('BiocParallel')
>>>> library('BatchJobs')
>>>>
>>>> # Here I pick a recursive functions to make the problem a bit 
>>>> harder, i.e.
>>>> # the function needs to call itself ("itself" = see below)
>>>> fib <- function(n=0) {
>>>>    if (n < 0) stop("Invalid 'n': ", n)
>>>>    if (n == 0 || n == 1) return(1)
>>>>    fib(n-2) + fib(n-1)
>>>> }
>>>>
>>>> # Executing in the current R session
>>>> cluster.functions <- makeClusterFunctionsInteractive()
>>>> bpParams <- BatchJobsParam(cluster.functions=cluster.functions)
>>>> register(bpParams)
>>>> values <- bplapply(0:9, FUN=fib)
>>>> ## SubmitJobs |++++++++++++++++++++++++++++++++++| 100% (00:00:00)
>>>> ## Waiting [S:0 R:0 D:10 E:0] |+++++++++++++++++++| 100% (00:00:00)
>>>>
>>>>
>>>> # Executing in a separate R process, where fib() is not defined
>>>> # (not specific to BiocParallel)
>>>> cluster.functions <- makeClusterFunctionsLocal()
>>>> bpParams <- BatchJobsParam(cluster.functions=cluster.functions)
>>>> register(bpParams)
>>>> values <- bplapply(0:9, FUN=fib)
>>>> ## SubmitJobs |++++++++++++++++++++++++++++++++++| 100% (00:00:00)
>>>> ## Waiting [S:0 R:0 D:10 E:0] |+++++++++++++++++++| 100% (00:00:00)
>>>> Error in LastError$store(results = results, is.error = !ok, 
>>>> throw.error =
>>>> TRUE)
>>>> :
>>>>    Errors occurred during execution. First error message:
>>>> Error in FUN(...): could not find function "fib"
>>>> [...]
>>>>
>>>>
>>>> # The following illustrates that the solution is not always
>>>> straightforward.
>>>> # (not specific to BiocParallel; must have been discussed previously)
>>>> values <- bplapply(0:9, FUN=function(n, fib) {
>>>>    fib(n)
>>>> }, fib=fib)
>>>> Error in LastError$store(results = results, is.error = !ok,
>>>> throw.error = TRUE) :
>>>>    Errors occurred during execution. First error message:
>>>> Error in fib(n): could not find function "fib"
>>>> [...]
>>>>
>>>> # Workaround; make fib() aware of itself
>>>> # (this is something the user need to do, and would be very
>>>> #  hard for BiocParallel et al. to automate.  BTW, should all
>>>> #  recursive functions be implemented this way?).
>>>> fib <- function(n=0) {
>>>>    if (n < 0) stop("Invalid 'n': ", n)
>>>>    if (n == 0 || n == 1) return(1)
>>>>    fib <- sys.function() # Make function aware of itself
>>>>    fib(n-2) + fib(n-1)
>>>> }
>>>> values <- bplapply(0:9, FUN=function(n, fib) {
>>>>    fib(n)
>>>> }, fib=fib)
>>>>
>>>>
>>>> WISHLIST:
>>>> Considering the above recursive issue solved, a slightly more explicit
>>>> and standardized solution is then:
>>>>
>>>> values <- bplapply(0:9, FUN=function(n, BPGLOBALS=NULL) {
>>>>    for (name in names(BPGLOBALS)) assign(name, BPGLOBALS[[name]])
>>>>    fib(n)
>>>> }, BPGLOBALS=list(fib=fib))
>>>>
>>>> Could the above be generalized into something as neat as:
>>>>
>>>> bpExport("fib")
>>>> values <- bplapply(0:9, FUN=function(n) {
>>>>    BiocParallel::bpImport("fib")
>>>>    fib(n)
>>>> })
>>>>
>>>> or ideally just (analogously to parallel::clusterExport()):
>>>>
>>>> bpExport("fib")
>>>> values <- bplapply(0:9, FUN=fib)
>>>>
>>>> /Henrik
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list