[Bioc-devel] BiocParallel: Best standards for passing locally assigned variables/functions, e.g. a bpExport()?
Ryan
rct at thompsonclan.org
Mon Nov 4 07:46:44 CET 2013
Ok, here is my attempt at a function to get the list of user-defined
free variables that a function refers to:
https://gist.github.com/DarwinAwardWinner/7298557
Is uses codetools, so it is subject to the limitations of that package,
but for simple examples, it successfully detects when a function refers
to something in the global env.
On Sun Nov 3 21:14:29 2013, Gabriel Becker wrote:
> Ryan (et al),
>
> FYI:
>
> > f
> function() {
> x = rnorm(x)
> x
> }
> > findGlobals(f)
> [1] "=" "{" "rnorm"
>
> "x" should be in the list of globals but it isn't.
>
> ~G
>
> > sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] codetools_0.2-8
>
>
>
> On Sun, Nov 3, 2013 at 5:37 PM, Ryan <rct at thompsonclan.org
> <mailto:rct at thompsonclan.org>> wrote:
>
> Looking at the codetools package, I think "findGlobals" is
> basically exactly what we want here, right? As you say, there are
> necessarily limitations due to R being a dynamic language, but the
> goal is to catch common errors, not stop people from tricking the
> check.
>
> I think I'll try to code something up soon.
>
> -Ryan
>
>
> On 11/3/13, 5:10 PM, Gabriel Becker wrote:
>> Henrik,
>>
>> See https://github.com/duncantl/CodeDepends (as used by used by
>> https://github.com/gmbecker/RCacheSuite). It will identify
>> necessarily defined symbols (input variables) for code that is
>> not doing certain tricks (eg get(), mixing data.frame columns and
>> gobal variables in formulas, etc ).
>>
>> Tierney's codetools package also does things along these lines
>> but there are some situations where it has trouble. I can give
>> more detail if desired.
>>
>> ~G
>>
>>
>> On Sun, Nov 3, 2013 at 3:04 PM, Ryan <rct at thompsonclan.org
>> <mailto:rct at thompsonclan.org>> wrote:
>>
>> Another potential easy step we can do is that if FUN function
>> in the user's workspace, we automatically export that
>> function under the same name in the children. This would make
>> recursive functions just work, but it might be a bit too
>> magical.
>>
>>
>> On 11/3/13, 2:38 PM, Ryan wrote:
>>
>> Here's an easy thing we can add to BiocParallel in the
>> short term. The following code defines a wrapper function
>> "withBPExtraErrorText" that simply appends an additional
>> message to the end of any error that looks like it is
>> about a missing variable. We could wrap every evaluation
>> in a similar tryCatch to at least provide a more
>> informative error message when a subprocess has a missing
>> variable.
>>
>> -Ryan
>>
>> withBPExtraErrorText <- function(expr) {
>> tryCatch({
>> expr
>> }, simpleError = function(err) {
>> if (grepl("^object '(.*)' not found$",
>> err$message, perl=TRUE)) {
>> ## It is an error due to a variable not found.
>> err$message <- paste0(err$message, ". Maybe
>> you forgot to export this variable from the main R
>> session using \"bpexport\"?")
>> }
>> stop(err)
>> })
>> }
>>
>> x <- 5
>>
>> ## Succeeds
>> withBPExtraErrorText(x)
>>
>> ## Fails with more informative error message
>> withBPExtraErrorText(y)
>>
>>
>>
>> On Sun Nov 3 14:01:48 2013, Henrik Bengtsson wrote:
>>
>> On Sun, Nov 3, 2013 at 1:29 PM, Michael Lawrence
>> <lawrence.michael at gene.com
>> <mailto:lawrence.michael at gene.com>> wrote:
>>
>> An analog to clusterExport is a good idea. To
>> make it even easier, we could
>> have a dynamic environment based on object tables
>> that would catch missing
>> symbols and download them from the parent thread.
>> But maybe there's some
>> benefit to being explicit?
>>
>>
>> A first step to fully automate this would be to
>> provide some (opt
>> in/out) mechanism for code inspection and warn about
>> non-defined
>> objects (cf. 'R CMD check'). That is of course major
>> work, but will
>> certainly spare the community/users 1000's of hours
>> in troubleshooting
>> and the mailing lists from "why doesn't my parallel
>> code not work"
>> messages. Such protection may be better suited for
>> the 'parallel'
>> package though. Unfortunately, it's beyond my
>> skills/time to pull
>> such a thing together.
>>
>> /Henrik
>>
>>
>> Michael
>>
>>
>> On Sun, Nov 3, 2013 at 12:39 PM, Henrik Bengtsson
>> <hb at biostat.ucsf.edu <mailto:hb at biostat.ucsf.edu>>
>> wrote:
>>
>>
>> Hi,
>>
>> in BiocParallel, is there a suggested (or
>> planned) best standards for
>> making *locally* assigned variables (e.g.
>> functions) available to the
>> applied function when it runs in a separate R
>> process (which will be
>> the most common use case)? I understand that
>> avoid local variables
>> should be avoided and it's preferred to put
>> as mush as possible in
>> packages, but that's not always possible or
>> very convenient.
>>
>> EXAMPLE:
>>
>> library('BiocParallel')
>> library('BatchJobs')
>>
>> # Here I pick a recursive functions to make
>> the problem a bit harder, i.e.
>> # the function needs to call itself ("itself"
>> = see below)
>> fib <- function(n=0) {
>> if (n < 0) stop("Invalid 'n': ", n)
>> if (n == 0 || n == 1) return(1)
>> fib(n-2) + fib(n-1)
>> }
>>
>> # Executing in the current R session
>> cluster.functions <-
>> makeClusterFunctionsInteractive()
>> bpParams <-
>> BatchJobsParam(cluster.functions=cluster.functions)
>> register(bpParams)
>> values <- bplapply(0:9, FUN=fib)
>> ## SubmitJobs
>> |++++++++++++++++++++++++++++++++++| 100%
>> (00:00:00)
>> ## Waiting [S:0 R:0 D:10 E:0]
>> |+++++++++++++++++++| 100% (00:00:00)
>>
>>
>> # Executing in a separate R process, where
>> fib() is not defined
>> # (not specific to BiocParallel)
>> cluster.functions <- makeClusterFunctionsLocal()
>> bpParams <-
>> BatchJobsParam(cluster.functions=cluster.functions)
>> register(bpParams)
>> values <- bplapply(0:9, FUN=fib)
>> ## SubmitJobs
>> |++++++++++++++++++++++++++++++++++| 100%
>> (00:00:00)
>> ## Waiting [S:0 R:0 D:10 E:0]
>> |+++++++++++++++++++| 100% (00:00:00)
>> Error in LastError$store(results = results,
>> is.error = !ok, throw.error =
>> TRUE)
>> :
>> Errors occurred during execution. First
>> error message:
>> Error in FUN(...): could not find function "fib"
>> [...]
>>
>>
>> # The following illustrates that the solution
>> is not always
>> straightforward.
>> # (not specific to BiocParallel; must have
>> been discussed previously)
>> values <- bplapply(0:9, FUN=function(n, fib) {
>> fib(n)
>> }, fib=fib)
>> Error in LastError$store(results = results,
>> is.error = !ok,
>> throw.error = TRUE) :
>> Errors occurred during execution. First
>> error message:
>> Error in fib(n): could not find function "fib"
>> [...]
>>
>> # Workaround; make fib() aware of itself
>> # (this is something the user need to do, and
>> would be very
>> # hard for BiocParallel et al. to automate.
>> BTW, should all
>> # recursive functions be implemented this way?).
>> fib <- function(n=0) {
>> if (n < 0) stop("Invalid 'n': ", n)
>> if (n == 0 || n == 1) return(1)
>> fib <- sys.function() # Make function
>> aware of itself
>> fib(n-2) + fib(n-1)
>> }
>> values <- bplapply(0:9, FUN=function(n, fib) {
>> fib(n)
>> }, fib=fib)
>>
>>
>> WISHLIST:
>> Considering the above recursive issue solved,
>> a slightly more explicit
>> and standardized solution is then:
>>
>> values <- bplapply(0:9, FUN=function(n,
>> BPGLOBALS=NULL) {
>> for (name in names(BPGLOBALS))
>> assign(name, BPGLOBALS[[name]])
>> fib(n)
>> }, BPGLOBALS=list(fib=fib))
>>
>> Could the above be generalized into something
>> as neat as:
>>
>> bpExport("fib")
>> values <- bplapply(0:9, FUN=function(n) {
>> BiocParallel::bpImport("fib")
>> fib(n)
>> })
>>
>> or ideally just (analogously to
>> parallel::clusterExport()):
>>
>> bpExport("fib")
>> values <- bplapply(0:9, FUN=fib)
>>
>> /Henrik
>>
>> _______________________________________________
>> Bioc-devel at r-project.org
>> <mailto:Bioc-devel at r-project.org> mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>>
>>
>> _______________________________________________
>> Bioc-devel at r-project.org
>> <mailto:Bioc-devel at r-project.org> mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>> _______________________________________________
>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>> mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>>
>>
>> --
>> Gabriel Becker
>> Graduate Student
>> Statistics Department
>> University of California, Davis
>
>
>
>
> --
> Gabriel Becker
> Graduate Student
> Statistics Department
> University of California, Davis
More information about the Bioc-devel
mailing list