[Bioc-devel] BiocParallel: Best standards for passing locally assigned variables/functions, e.g. a bpExport()?
Ryan
rct at thompsonclan.org
Mon Nov 4 02:28:18 CET 2013
I guess all we need to do is to detect whether a function would try to
access a free variable in the user's workspace, and warn/error if so.
It looks like CodeDepends could do that. I could try to come up with an
implementation. I guess we would add CodeDepends as an optional
dependency for BiocParallel, and only do the checks if CodeDepends is
available.
On Sun Nov 3 17:10:45 2013, Gabriel Becker wrote:
> Henrik,
>
> See https://github.com/duncantl/CodeDepends (as used by used by
> https://github.com/gmbecker/RCacheSuite). It will identify necessarily
> defined symbols (input variables) for code that is not doing certain
> tricks (eg get(), mixing data.frame columns and gobal variables in
> formulas, etc ).
>
> Tierney's codetools package also does things along these lines but
> there are some situations where it has trouble. I can give more detail
> if desired.
>
> ~G
>
>
> On Sun, Nov 3, 2013 at 3:04 PM, Ryan <rct at thompsonclan.org
> <mailto:rct at thompsonclan.org>> wrote:
>
> Another potential easy step we can do is that if FUN function in
> the user's workspace, we automatically export that function under
> the same name in the children. This would make recursive functions
> just work, but it might be a bit too magical.
>
>
> On 11/3/13, 2:38 PM, Ryan wrote:
>
> Here's an easy thing we can add to BiocParallel in the short
> term. The following code defines a wrapper function
> "withBPExtraErrorText" that simply appends an additional
> message to the end of any error that looks like it is about a
> missing variable. We could wrap every evaluation in a similar
> tryCatch to at least provide a more informative error message
> when a subprocess has a missing variable.
>
> -Ryan
>
> withBPExtraErrorText <- function(expr) {
> tryCatch({
> expr
> }, simpleError = function(err) {
> if (grepl("^object '(.*)' not found$", err$message,
> perl=TRUE)) {
> ## It is an error due to a variable not found.
> err$message <- paste0(err$message, ". Maybe you
> forgot to export this variable from the main R session using
> \"bpexport\"?")
> }
> stop(err)
> })
> }
>
> x <- 5
>
> ## Succeeds
> withBPExtraErrorText(x)
>
> ## Fails with more informative error message
> withBPExtraErrorText(y)
>
>
>
> On Sun Nov 3 14:01:48 2013, Henrik Bengtsson wrote:
>
> On Sun, Nov 3, 2013 at 1:29 PM, Michael Lawrence
> <lawrence.michael at gene.com
> <mailto:lawrence.michael at gene.com>> wrote:
>
> An analog to clusterExport is a good idea. To make it
> even easier, we could
> have a dynamic environment based on object tables that
> would catch missing
> symbols and download them from the parent thread. But
> maybe there's some
> benefit to being explicit?
>
>
> A first step to fully automate this would be to provide
> some (opt
> in/out) mechanism for code inspection and warn about
> non-defined
> objects (cf. 'R CMD check'). That is of course major
> work, but will
> certainly spare the community/users 1000's of hours in
> troubleshooting
> and the mailing lists from "why doesn't my parallel code
> not work"
> messages. Such protection may be better suited for the
> 'parallel'
> package though. Unfortunately, it's beyond my skills/time
> to pull
> such a thing together.
>
> /Henrik
>
>
> Michael
>
>
> On Sun, Nov 3, 2013 at 12:39 PM, Henrik Bengtsson
> <hb at biostat.ucsf.edu <mailto:hb at biostat.ucsf.edu>>
> wrote:
>
>
> Hi,
>
> in BiocParallel, is there a suggested (or planned)
> best standards for
> making *locally* assigned variables (e.g.
> functions) available to the
> applied function when it runs in a separate R
> process (which will be
> the most common use case)? I understand that
> avoid local variables
> should be avoided and it's preferred to put as
> mush as possible in
> packages, but that's not always possible or very
> convenient.
>
> EXAMPLE:
>
> library('BiocParallel')
> library('BatchJobs')
>
> # Here I pick a recursive functions to make the
> problem a bit harder, i.e.
> # the function needs to call itself ("itself" =
> see below)
> fib <- function(n=0) {
> if (n < 0) stop("Invalid 'n': ", n)
> if (n == 0 || n == 1) return(1)
> fib(n-2) + fib(n-1)
> }
>
> # Executing in the current R session
> cluster.functions <-
> makeClusterFunctionsInteractiv__e()
> bpParams <-
> BatchJobsParam(cluster.__functions=cluster.functions)
> register(bpParams)
> values <- bplapply(0:9, FUN=fib)
> ## SubmitJobs
> |+++++++++++++++++++++++++++++__+++++| 100% (00:00:00)
> ## Waiting [S:0 R:0 D:10 E:0]
> |+++++++++++++++++++| 100% (00:00:00)
>
>
> # Executing in a separate R process, where fib()
> is not defined
> # (not specific to BiocParallel)
> cluster.functions <- makeClusterFunctionsLocal()
> bpParams <-
> BatchJobsParam(cluster.__functions=cluster.functions)
> register(bpParams)
> values <- bplapply(0:9, FUN=fib)
> ## SubmitJobs
> |+++++++++++++++++++++++++++++__+++++| 100% (00:00:00)
> ## Waiting [S:0 R:0 D:10 E:0]
> |+++++++++++++++++++| 100% (00:00:00)
> Error in LastError$store(results = results,
> is.error = !ok, throw.error =
> TRUE)
> :
> Errors occurred during execution. First error
> message:
> Error in FUN(...): could not find function "fib"
> [...]
>
>
> # The following illustrates that the solution is
> not always
> straightforward.
> # (not specific to BiocParallel; must have been
> discussed previously)
> values <- bplapply(0:9, FUN=function(n, fib) {
> fib(n)
> }, fib=fib)
> Error in LastError$store(results = results,
> is.error = !ok,
> throw.error = TRUE) :
> Errors occurred during execution. First error
> message:
> Error in fib(n): could not find function "fib"
> [...]
>
> # Workaround; make fib() aware of itself
> # (this is something the user need to do, and
> would be very
> # hard for BiocParallel et al. to automate. BTW,
> should all
> # recursive functions be implemented this way?).
> fib <- function(n=0) {
> if (n < 0) stop("Invalid 'n': ", n)
> if (n == 0 || n == 1) return(1)
> fib <- sys.function() # Make function aware of
> itself
> fib(n-2) + fib(n-1)
> }
> values <- bplapply(0:9, FUN=function(n, fib) {
> fib(n)
> }, fib=fib)
>
>
> WISHLIST:
> Considering the above recursive issue solved, a
> slightly more explicit
> and standardized solution is then:
>
> values <- bplapply(0:9, FUN=function(n,
> BPGLOBALS=NULL) {
> for (name in names(BPGLOBALS)) assign(name,
> BPGLOBALS[[name]])
> fib(n)
> }, BPGLOBALS=list(fib=fib))
>
> Could the above be generalized into something as
> neat as:
>
> bpExport("fib")
> values <- bplapply(0:9, FUN=function(n) {
> BiocParallel::bpImport("fib")
> fib(n)
> })
>
> or ideally just (analogously to
> parallel::clusterExport()):
>
> bpExport("fib")
> values <- bplapply(0:9, FUN=fib)
>
> /Henrik
>
> _________________________________________________
> Bioc-devel at r-project.org
> <mailto:Bioc-devel at r-project.org> mailing list
> https://stat.ethz.ch/mailman/__listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>
>
>
> _________________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
> mailing list
> https://stat.ethz.ch/mailman/__listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>
> _________________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing
> list
> https://stat.ethz.ch/mailman/__listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>
>
>
> --
> Gabriel Becker
> Graduate Student
> Statistics Department
> University of California, Davis
More information about the Bioc-devel
mailing list