[R-sig-hpc] Problems with Exporting Functions with Foreach/DoSNOW

Reuben Bellika reuben at deltamotion.com
Tue Nov 29 00:10:38 CET 2011


Hi Steve,

Thanks for your reply.

That all makes sense. I have noticed that functions in packages get loaded just fine, and I'm intending to go that route at some point.

I think I will look into clusterExport for the time being - it's simpler for development.

Thank you,
Reuben

-----Original Message-----
From: Stephen Weston [mailto:stephen.b.weston at gmail.com] 
Sent: Monday, November 28, 2011 1:59 PM
To: Reuben Bellika
Cc: r-sig-hpc at r-project.org
Subject: Re: [R-sig-hpc] Problems with Exporting Functions with Foreach/DoSNOW

Hi Reuben,

The problem is that main.fun expects to find helper1 and helper2 in the global environment, but the foreach .export argument is exporting them to a temporary environment that is only used for the duration of that foreach operation.  Thus, foreach is able to find main.fun, but main.fun can't find helper1 or helper2 since main.fun is only looking in the global environment and the currently loaded packages.

You don't have this problem with doMC, since the workers are dynamically forked by the multicore package, so they actually have helper1 and helper2 defined in the global environment, just like your R session.  doSNOW tries to help by automatic and manual exporting tricks, but it's far from perfect, as your example aptly demonstrates.

To avoid this problem, foreach would either have to export the functions to the global environment, or modify main.fun to include the temporary environment in its scope.  Both of those options seem to have problems.

In general, I prefer to put these kinds of functions into a package.  Then you just need to use the foreach .packages argument to load that package on the workers.

A simple alternative is to use the snow clusterExport function to export main.fun, helper1, and helper2 to the snow workers.
Then they really will be defined in the global environment as main.fun expects.  That isn't portable between different parallel backends, of course.  That's why I think that putting them in a package is a better option.

Here's a modified version of your example that uses the clusterExport function to fix the problem:


library(foreach)
library(snow)
library(doSNOW)

# Two helper functions
helper1 <- function(i) { return(i + 1) }
helper2 <- function(i) { return(i + 2) }

# The main function called once each loop main.fun <- function(i) {
   # Call two other functions
   return(helper1(i) + helper2(i))
}

# Compute the values (odd numbers from 5 to 23) using a for loop compute.local <- function() {
   values <- c()
   for (i in 1:10)
   {
       values <- c(values, main.fun(i))
   }

   return(values)
}

# Compute the values (odd numbers from 5 to 23) using a foreach loop compute.cluster <- function() {
   values <- foreach(i = 1:10,
                     .combine = "c") %dopar%
   {
       main.fun(i)
   }

   return(values)
}

# Start the cluster and register with doSNOW (node names are just examples) cl <- makeCluster(2, type = "SOCK") clusterExport(cl, c("main.fun", "helper1", "helper2"))
registerDoSNOW(cl)

print(compute.local())
print(compute.cluster())

# Stop the cluster
stopCluster(cl)


And thanks for the excellent test program.

- Steve


On Mon, Nov 28, 2011 at 3:36 PM, Reuben Bellika <reuben at deltamotion.com> wrote:
> Hello,
>
> I am currently using the "foreach" and "doSNOW" packages to run calculations on our cluster (using SNOW over sockets). Sometimes things get rather complex with multiple functions being called inside the foreach loop. There seems to be a limitation on how I can pass functions to the SNOW cluster workers. If one main function gets evaluated multiple times inside the foreach loop, and this function calls several helper functions, I get an error saying that these helper functions are undefined, even if they are included in the export option.
>
> Here's an example of what I'm talking about:
>
>
> library(foreach)
> library(snow)
> library(doSNOW)
>
> # Two helper functions
> helper1 <- function(i) { return(i + 1) }
> helper2 <- function(i) { return(i + 2) }
>
> # The main function called once each loop main.fun <- function(i) {
>    # Call two other functions
>    return(helper1(i) + helper2(i))
> }
>
> # Compute the values (odd numbers from 5 to 23) using a for loop 
> compute.local <- function() {
>    values <- c()
>    for (i in 1:10)
>    {
>        values <- c(values, main.fun(i))
>    }
>
>    return(values)
> }
>
> # Compute the values (odd numbers from 5 to 23) using a foreach loop 
> compute.cluster <- function() {
>    values <- foreach(i = 1:10,
>                      .export = c("main.fun", "helper1", "helper2"),
>                      .combine = "c") %dopar%
>    {
>        main.fun(i)
>    }
>
>    return(values)
> }
>
> # Start the cluster and register with doSNOW (node names are just 
> examples) cl <- makeCluster(c("r-node-001", "r-node-002"), type = 
> "SOCK")
> registerDoSNOW(cl)
>
> print(compute.local())
> print(compute.cluster())
>
> # Stop the cluster
> stopCluster(cl)
>
>
> When I run this example on our cluster, I get the expected output from the compute.local() function:
> [1]  5  7  9 11 13 15 17 19 21 23
>
> The compute.cluster() function, however, terminates with an error:
> Error in { : task 1 failed - "could not find function "helper1""
>
> Note that there was no problem with exporting main.fun, only helper1 and helper2.
>
> So, what I am wondering, is there something I don't understand about the initialization of the foreach loop, or is there a bug of some type with the doSNOW package? I've tried running this same example using doMC to evaluate the foreach loop instead of doSNOW, and it works perfectly, with no errors. I can work around the error by defining helper1 and helper2 inside the body of main.fun, but this gets awkward in real cases where the helper functions themselves are very complex and I want to reuse them in other places.
>
> The cluster master and worker nodes are running R 2.14.0 on Debian "Lenny" i386.
>
> Any ideas?
>
> Thank you,
> Reuben Bellika
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>



More information about the R-sig-hpc mailing list