[R-sig-hpc] mclapply() hangs when keras-based neural networks are involved

Simon Urbanek @|mon@urb@nek @end|ng |rom r-project@org
Fri Aug 30 16:13:00 CEST 2019


Yep, I fully agree, I ran into the same problem, but even a trained model still uses TF to run the scoring so all TF limitations still apply. From some searching I saw the TF community is aware of the problem, but there is no solution.

Obviously, you can just start n processes, limit resources to each and use them for scoring. But, again, note that TF tries to use all resources so unless you use multiple models.

In principle you could save all the weights and use plain C code to score the model using those weights which would be safe, but likely a lot of duplicate work depending on the model. 

Cheers,
Simon 


> On Aug 30, 2019, at 05:19, Marius Hofert <marius.hofert using uwaterloo.ca> wrote:
> 
> Hi Simon,
> 
> thanks a lot for helping.
> 
> That's a huge let-down... For training neural networks, this seems
> understandable, but once trained, just to evaluate neural networks,
> all applications are then restricted to serial computations... *sigh*.
> 
> Cheers,
> M
> 
> 
> 
> 
> On Fri, Aug 30, 2019 at 9:39 AM Simon Urbanek
> <simon.urbanek using r-project.org> wrote:
>> 
>> Marius,
>> 
>> Tensorflow doesn’t support any parallel computing including forking. It is assumed that all parallelization is done by TF itself and it takes over all resources in a way such that they cannot be shared across processes. Hence you cannot combine TF and parallel (and hence by induction Keras).
>> 
>> Cheers,
>> Simon
>> 
>> 
>>> On Aug 28, 2019, at 3:32 AM, Marius Hofert <marius.hofert using uwaterloo.ca> wrote:
>>> 
>>> Hi,
>>> 
>>> Below is an example where mclapply() 'hangs' after starting the work
>>> on two cores.
>>> This happens on macOS and Ubuntu (sessionInfo() below). I also see no activity
>>> on 'htop'. lapply() works, though. What is the cause of this behavior?
>>> 
>>> Cheers,
>>> M
>>> 
>>> library(tensorflow)
>>> library(keras)
>>> library(parallel)
>>> ## TensorFlow also needs to be installed, which can be done via
>>> install_tensorflow() from R
>>> 
>>> ## 1) Setup
>>> in.lay <- layer_input(shape = 2)
>>> hid.lay <- layer_dense(in.lay,  units = 300, activation = "relu")
>>> out.lay <- layer_dense(hid.lay, units = 2,   activation = "sigmoid")
>>> NN <- keras_model(in.lay, out.lay)
>>> loss_fn <- function(x, y = out.lay) loss_mean_squared_error(x, y)
>>> NN %>% compile(optimizer = "adam", loss = loss_fn)
>>> 
>>> ## 2) Training
>>> NN %>% fit(x = matrix(runif(10000 * 2), ncol = 2), # prior data
>>>          y = matrix(rnorm(10000 * 2), ncol = 2), # training data
>>>          batch_size = 5000, epochs = 1)
>>> 
>>> ## 3) Generate samples by evaluating the NN on a prior sample
>>> aux <- function(b) {
>>>   cat(paste("Working on case",b,"\n"))
>>>   Sys.sleep(2)
>>>   predict(NN, x = matrix(runif(100 * 2), ncol = 2)) # mclapply()
>>> hangs here (on macOS and Ubuntu)
>>> }
>>> 
>>> ## 4) Call that hangs after the two processes are started
>>> res.serial   <-   lapply(1:5, function(b) aux(b)) # works
>>> res.parallel <- mclapply(1:5, function(b) aux(b), mc.cores = 2) #
>>> hangs once both cores are used
>>> 
>>> ## Output:
>>> ## For lapply():
>>> Working on case 1
>>> Working on case 2
>>> Working on case 3
>>> Working on case 4
>>> Working on case 5
>>> ## For mclapply():
>>> Working on case 1
>>> Working on case 2
>>> 
>>> ## sessionInfo() on macOS:
>>> R version 3.6.1 (2019-07-05)
>>> Platform: x86_64-apple-darwin18.7.0 (64-bit)
>>> Running under: macOS Mojave 10.14.6
>>> 
>>> Matrix products: default
>>> BLAS:   /usr/local/R/R-3.6.1_build/lib/libRblas.dylib
>>> LAPACK: /usr/local/R/R-3.6.1_build/lib/libRlapack.dylib
>>> 
>>> locale:
>>> [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
>>> 
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>> 
>>> loaded via a namespace (and not attached):
>>> [1] compiler_3.6.1 tools_3.6.1
>>> 
>>> ## sessionInfo() on Ubuntu:
>>> R version 3.6.0 (2019-04-26)
>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>> Running under: Ubuntu 18.04.3 LTS
>>> 
>>> Matrix products: default
>>> BLAS:   /u/mhofert/soft/R/R-3.6.0_build/lib/libRblas.so
>>> LAPACK: /u/mhofert/soft/R/R-3.6.0_build/lib/libRlapack.so
>>> 
>>> locale:
>>> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>> [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>> [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>> [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>> 
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>> 
>>> loaded via a namespace (and not attached):
>>> [1] compiler_3.6.0
>>> 
>>> _______________________________________________
>>> R-sig-hpc mailing list
>>> R-sig-hpc using r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>>> 
>> 
> 



More information about the R-sig-hpc mailing list