[R] error serialize (foreach)

Doran, Harold HDoran at air.org
Sun Dec 4 03:11:01 CET 2016


As a follow up to this, I have been able to generate a toy example of reproducible code that generates the same problem. Below is just a sample to represent the issue, but my data and subsequent functions acting on the data are much more involved. 

I no longer have the error, but, the loop running in parallel is extremely slow relative to its serialized counterpart.

I have narrowed down the problem to the fact that I am searching through a very large list, grabbing the data from that list by indexing to subset and then doing stuff to it. Both "work", but the parallel version is very, very slow. I believe I am sending data files to each core and the number of searches happening is prohibitive.

I am very much stuck in the design-based way of how I would do this particular problem on a single core and am not sure if there is a better designed based approach for solving this problem in the parallel version. 

Any advice on better ways to work with the %dopar% version here?

N <- 200000
myList <- vector('list', N)
names(myList) <- 1:N
for(i in 1:N){
	myList[[i]] <- rnorm(100)
}
nms <- 1:N
library(foreach)
library(doParallel)
registerDoParallel(cores=7)

result <- foreach(i = 1:3) %do% {
	dat <- myList[[which(names(myList) == nms[i])]]
	mean(dat)
}

result <- foreach(i = 1:3) %dopar% {
	dat <- myList[[which(names(myList) == nms[i])]]
	mean(dat)
}
-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Doran, Harold
Sent: Saturday, December 03, 2016 4:26 PM
To: r-help at r-project.org
Subject: [R] error serialize (foreach)

I have a portion of a foreach loop that I cannot run as parallel but works fine when serialized. Below is a representation of the problem as in this instance I cannot provide reproducible data to generate the same error, the actual data I am working with are confidential.

Within each foreach loop are a series of custom functions acting on my data. When using %do% I get expected result but replacing it with %dopar% generates the error.

I have searched archives and also stackexchange and see this is an issue that arises and I have tried a couple of the recommendations, like trying to use an outfile in makeCluster. But I am not having success.

Oddly, (or perhaps not oddly), others portions of my program run in parallel and do not generate this same error

library(foreach)
library(doParallel)
registerDoParallel(cores=3)

# This portion runs and produces expected result result <- foreach(i = 1:N) %do% {
tmp1 <- function1(...)
tmp2 <- function2(...)
tmp2
}

# This portion generates error in serialize result <- foreach(i = 1:N) %dopar% {
tmp1 <- function1(...)
tmp2 <- function2(...)
tmp2
}

error in serialize(data, node$con) : error writing to connection


	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list