[R-SIG-Mac] parLapply is not using all nodes

ALPEROVYCH Yan ALPEROVYCH at em-lyon.com
Fri Dec 11 17:20:46 CET 2015


Hi John,

I tried to complicate the example a bit so that it takes longer to evaluate (see below). All cores seem to be going, but it looks like the environment in which the evaluation goes matters. I run the code twice, once in the global env and second time in the separately created one and here are the timings:

Timing in global environment
user  system elapsed
0.359   0.153  44.263

Timing in a separate environment
user  system elapsed
0.528   0.386  65.376

My original objects are created within a function and are much bigger than in this example. I have 122^2 expressions to evaluate. The data matrix has more than 42k rows (10^6 in the example) and 123 columns (n in the example). Since this is performed within a function, and I have not yet figured out how to break the data matrix into individual objects to be stored in the function environment, I sent all to a newly created one (list2env) and performed the evaluation there. Could this be a reason for some cores not going?

Thank you again for your help.

Yan

#
rm(list = ls())
library(parallel)
set.seed(123)
#
n <- 20
expr <- parse(text = paste("pnorm(", paste(paste("b", 1:n, sep = ""), paste("x", 1:n, sep = ""), sep = "*", collapse = " + "), ")", sep = ""))
coefs <- paste("b", 1:n, sep = "")
vars <- paste("x", 1:n, sep = "")
G <- lapply(lapply(vars, function(x) D(expr, x)), function(x) lapply(coefs, function(arg) D(x, arg)))
#
b <- as.list(rnorm(n, 5, 12))
names(b) <- coefs
x <- matrix(rnorm(10^6*(n)), ncol = n)
colnames(x) <- vars
#
ev <- new.env()
list2env(setNames(split(x, col(x)), vars), envir = ev)
list2env(b, envir = ev)
#
nc <- detectCores() - 1
cl <- makeCluster(nc, type = "FORK")
system.time(grad_g <- parLapply(cl, G, function(Z) lapply(Z, function(x) mean(eval(x, envir = ev)))))
stopCluster(cl)
#
sapply(1:length(coefs), function(z) {assign(coefs[z], b[1, z], pos = 1)})
sapply(1:length(vars), function(z) {assign(vars[z], x[, z], pos = 1)})
nc <- detectCores() - 1
cl <- makeCluster(nc, type = "FORK")
system.time(grad_g <- parLapply(cl, G, function(Z) lapply(Z, function(x) mean(eval(x)))))
stopCluster(cl)
#
On Dec 11, 2015, at 2:16 PM, John Magnotti <john.magnotti at gmail.com<mailto:john.magnotti at gmail.com>> wrote:

Hi Yan,

Sorry it wasn't the simple answer. On my machine, your code creates 3 R processes (#cores -1) and they are all active.

If you have an even simpler example say

parLapply(cl, 1:8, function_that_takes_a_while)

does that get all the cores going?

John

On Fri, Dec 11, 2015 at 6:58 AM, ALPEROVYCH Yan <ALPEROVYCH at em-lyon.com<mailto:ALPEROVYCH at em-lyon.com>> wrote:
Hi John,

Thank you for the reply. In fact my original list has 122^2 elements in it (I simplified the code here), and parLapply behaves in a similar way still. Can it be something else?

Yan

On Dec 11, 2015, at 1:51 PM, John Magnotti <john.magnotti at gmail.com<mailto:john.magnotti at gmail.com><mailto:john.magnotti at gmail.com<mailto:john.magnotti at gmail.com>>> wrote:

Hello Yan,

I think parLapply is just assigning a core for every item in the list, G, you supplied. Because you have more cores than items in the list, some of the cores won't receive any work.


John

On Fri, Dec 11, 2015 at 4:00 AM, ALPEROVYCH Yan <ALPEROVYCH at em-lyon.com<mailto:ALPEROVYCH at em-lyon.com><mailto:ALPEROVYCH at em-lyon.com<mailto:ALPEROVYCH at em-lyon.com>>> wrote:
Hello,

I have a piece of code that needs parallelization and it used to work just fine before (about 6 months ago). However, I had to rerun it yesterday and found out that my code is now behaving in a weird way - not all worker processes are charged with the computation. I created a little code that allows reproducing the issue. Here is a snapshot of the top command:

PID  COMMAND    %CPU    TIME
872  R                          97.7    00:02.45
871  R                          0.0     00:00.03
870  R                          98.2    00:02.93
869  R                          0.0     00:00.03
868  R                          97.7    00:02.46
867  R                          0.0     00:00.03
866  R                          94.4    00:02.36
862  R                          1.0             00:04.64

Interestingly, the mclapply command seems to correctly charge all workers instantly.

So is this an intended behavior?

Example that reproduces the issue on my machine:
#
rm(list = ls())
library(parallel)
set.seed(123)
#
expr <- expression(pnorm(b0 + b1*x1 + b2*x2 + b3*x3))
G <- list(
        D(D(expr, "x2"), "b0"),
        D(D(expr, "x2"), "b1"),
        D(D(expr, "x2"), "b2"),
        D(D(expr, "x2"), "b3"))
#
b0 <-  1.2
b1 <-  0.4
b2 <-  0.2
b3 <- -0.6
x1 <- rnorm(10^7)
x2 <- rnorm(10^7)
x3 <- rnorm(10^7)
#
nc <- detectCores() - 1
cl <- makeCluster(nc, type = "FORK")
grad_g <- parLapply(cl, G, function(Z) lapply(Z, function(x) mean(eval(x))))
stopCluster(cl)
#
sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.2 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base




----
Ce message electronique et tous les fichiers attaches qu'il contient sont confidentiels et destines exclusivement à l'usage de la personne à laquelle ils sont adresses. Si vous avez reçu ce message par erreur, merci de le retourner à son metteur. Les idees et opinions presentees dans ce message sont celles de son auteur, et ne representent pas necessairement celles de l'institution ou entite affiliee dont l'auteur est l'employe. La publication, l'usage, la distribution, l'impression ou la copie non autorisee de ce message et des attachements qu'il contient sont strictement interdits.

This email and any files transmitted with it are confide...{{dropped:10}}

_______________________________________________
R-SIG-Mac mailing list
R-SIG-Mac at r-project.org<mailto:R-SIG-Mac at r-project.org><mailto:R-SIG-Mac at r-project.org<mailto:R-SIG-Mac at r-project.org>>
https://stat.ethz.ch/mailman/listinfo/r-sig-mac


----
Ce message electronique et tous les fichiers attaches qu'il contient sont confidentiels et destines exclusivement à l'usage de la personne à laquelle ils sont adresses. Si vous avez reçu ce message par erreur, merci de le retourner à son metteur. Les idees et opinions presentees dans ce message sont celles de son auteur, et ne representent pas necessairement celles de l'institution ou entite affiliee dont l'auteur est l'employe. La publication, l'usage, la distribution, l'impression ou la copie non autorisee de ce message et des attachements qu'il contient sont strictement interdits.

This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please return it to the sender. The ideas and views expressed in this email are solely those of its author, and do not necessarily represent the views of the institution or company of which the author is an employee. Unauthorized publication, use, distribution, printing or copying of this e-mail or any attached files is strictly forbidden.


----
Ce message electronique et tous les fichiers attaches qu'il contient sont confidentiels et destines exclusivement à l'usage de la personne à laquelle ils sont adresses. Si vous avez reçu ce message par erreur, merci de le retourner à son metteur. Les idees et opinions presentees dans ce message sont celles de son auteur, et ne representent pas necessairement celles de l'institution ou entite affiliee dont l'auteur est l'employe. La publication, l'usage, la distribution, l'impression ou la copie non autorisee de ce message et des attachements qu'il contient sont strictement interdits.

This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please return it to the sender. The ideas and views expressed in this email are solely those of its author, and do not necessarily represent the views of the institution or company of which the author is an employee. Unauthorized publication, use, distribution, printing or copying of this e-mail or any attached files is strictly forbidden.


More information about the R-SIG-Mac mailing list