[R-SIG-Mac] parLapply is not using all nodes

Fri Dec 11 11:00:59 CET 2015


I have a piece of code that needs parallelization and it used to work just fine before (about 6 months ago). However, I had to rerun it yesterday and found out that my code is now behaving in a weird way - not all worker processes are charged with the computation. I created a little code that allows reproducing the issue. Here is a snapshot of the top command:

872  R            		97.7  	00:02.45 
871  R            		0.0   	00:00.03 
870  R            		98.2  	00:02.93 
869  R            		0.0   	00:00.03 
868  R            		97.7  	00:02.46 
867  R           		0.0   	00:00.03 
866  R            		94.4 	00:02.36 
862  R            		1.0  		00:04.64 

Interestingly, the mclapply command seems to correctly charge all workers instantly. 

So is this an intended behavior?

Example that reproduces the issue on my machine:
rm(list = ls())
expr <- expression(pnorm(b0 + b1*x1 + b2*x2 + b3*x3))
G <- list(
	D(D(expr, "x2"), "b0"),
	D(D(expr, "x2"), "b1"),
	D(D(expr, "x2"), "b2"),
	D(D(expr, "x2"), "b3"))
b0 <-  1.2
b1 <-  0.4
b2 <-  0.2
b3 <- -0.6
x1 <- rnorm(10^7)
x2 <- rnorm(10^7)
x3 <- rnorm(10^7)
nc <- detectCores() - 1
cl <- makeCluster(nc, type = "FORK")
grad_g <- parLapply(cl, G, function(Z) lapply(Z, function(x) mean(eval(x))))
R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.2 (El Capitan)

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

