[R-sig-hpc] Problems with Exporting Functions with Foreach/DoSNOW

Stephen Weston stephen.b.weston at gmail.com
Mon Nov 28 22:59:22 CET 2011


Hi Reuben,

The problem is that main.fun expects to find helper1 and helper2
in the global environment, but the foreach .export argument is
exporting them to a temporary environment that is only used for
the duration of that foreach operation.  Thus, foreach is able
to find main.fun, but main.fun can't find helper1 or helper2
since main.fun is only looking in the global environment and the
currently loaded packages.

You don't have this problem with doMC, since the workers are
dynamically forked by the multicore package, so they actually
have helper1 and helper2 defined in the global environment, just
like your R session.  doSNOW tries to help by automatic and
manual exporting tricks, but it's far from perfect, as your
example aptly demonstrates.

To avoid this problem, foreach would either have to export the
functions to the global environment, or modify main.fun to
include the temporary environment in its scope.  Both of those
options seem to have problems.

In general, I prefer to put these kinds of functions into a
package.  Then you just need to use the foreach .packages
argument to load that package on the workers.

A simple alternative is to use the snow clusterExport function
to export main.fun, helper1, and helper2 to the snow workers.
Then they really will be defined in the global environment as
main.fun expects.  That isn't portable between different
parallel backends, of course.  That's why I think that putting
them in a package is a better option.

Here's a modified version of your example that uses the
clusterExport function to fix the problem:


library(foreach)
library(snow)
library(doSNOW)

# Two helper functions
helper1 <- function(i) { return(i + 1) }
helper2 <- function(i) { return(i + 2) }

# The main function called once each loop
main.fun <- function(i)
{
   # Call two other functions
   return(helper1(i) + helper2(i))
}

# Compute the values (odd numbers from 5 to 23) using a for loop
compute.local <- function()
{
   values <- c()
   for (i in 1:10)
   {
       values <- c(values, main.fun(i))
   }

   return(values)
}

# Compute the values (odd numbers from 5 to 23) using a foreach loop
compute.cluster <- function()
{
   values <- foreach(i = 1:10,
                     .combine = "c") %dopar%
   {
       main.fun(i)
   }

   return(values)
}

# Start the cluster and register with doSNOW (node names are just examples)
cl <- makeCluster(2, type = "SOCK")
clusterExport(cl, c("main.fun", "helper1", "helper2"))
registerDoSNOW(cl)

print(compute.local())
print(compute.cluster())

# Stop the cluster
stopCluster(cl)


And thanks for the excellent test program.

- Steve


On Mon, Nov 28, 2011 at 3:36 PM, Reuben Bellika <reuben at deltamotion.com> wrote:
> Hello,
>
> I am currently using the "foreach" and "doSNOW" packages to run calculations on our cluster (using SNOW over sockets). Sometimes things get rather complex with multiple functions being called inside the foreach loop. There seems to be a limitation on how I can pass functions to the SNOW cluster workers. If one main function gets evaluated multiple times inside the foreach loop, and this function calls several helper functions, I get an error saying that these helper functions are undefined, even if they are included in the export option.
>
> Here's an example of what I'm talking about:
>
>
> library(foreach)
> library(snow)
> library(doSNOW)
>
> # Two helper functions
> helper1 <- function(i) { return(i + 1) }
> helper2 <- function(i) { return(i + 2) }
>
> # The main function called once each loop
> main.fun <- function(i)
> {
>    # Call two other functions
>    return(helper1(i) + helper2(i))
> }
>
> # Compute the values (odd numbers from 5 to 23) using a for loop
> compute.local <- function()
> {
>    values <- c()
>    for (i in 1:10)
>    {
>        values <- c(values, main.fun(i))
>    }
>
>    return(values)
> }
>
> # Compute the values (odd numbers from 5 to 23) using a foreach loop
> compute.cluster <- function()
> {
>    values <- foreach(i = 1:10,
>                      .export = c("main.fun", "helper1", "helper2"),
>                      .combine = "c") %dopar%
>    {
>        main.fun(i)
>    }
>
>    return(values)
> }
>
> # Start the cluster and register with doSNOW (node names are just examples)
> cl <- makeCluster(c("r-node-001", "r-node-002"), type = "SOCK")
> registerDoSNOW(cl)
>
> print(compute.local())
> print(compute.cluster())
>
> # Stop the cluster
> stopCluster(cl)
>
>
> When I run this example on our cluster, I get the expected output from the compute.local() function:
> [1]  5  7  9 11 13 15 17 19 21 23
>
> The compute.cluster() function, however, terminates with an error:
> Error in { : task 1 failed - "could not find function "helper1""
>
> Note that there was no problem with exporting main.fun, only helper1 and helper2.
>
> So, what I am wondering, is there something I don't understand about the initialization of the foreach loop, or is there a bug of some type with the doSNOW package? I've tried running this same example using doMC to evaluate the foreach loop instead of doSNOW, and it works perfectly, with no errors. I can work around the error by defining helper1 and helper2 inside the body of main.fun, but this gets awkward in real cases where the helper functions themselves are very complex and I want to reuse them in other places.
>
> The cluster master and worker nodes are running R 2.14.0 on Debian "Lenny" i386.
>
> Any ideas?
>
> Thank you,
> Reuben Bellika
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>



More information about the R-sig-hpc mailing list