[R] doParallel cores HPC

Ivan Krylov kry|ov@r00t @end|ng |rom gm@||@com
Fri Jun 26 09:10:46 CEST 2020


On Thu, 25 Jun 2020 00:29:42 +0000
"Silva, Eder David Borges da" <eder.silva using corteva.com> wrote:

> I have the HPC, with 10 nodes, and each node with 20 cores in UNIX OS.

> cl <- makePSOCKcluster(names=c('Host01', ... , 'Host10)
 
> This code is the best way for use all machine power?

The code as written will create one worker _process_ on each of the
hosts. What happens next depends on the code to be running and the way
R is installed.

The code may or may not be written to take advantage of multi-core CPUs
(e.g. using OpenMP). In particular, if R is linked with a
multi-threaded BLAS (such as OpenBLAS or MKL) and uses matrix algebra
during the computation, it may spawn multiple _threads_ to utilise the
CPU better. Whether it succeeds depends on multiple factors, including
the size of the task. On occasion I noticed OpenBLAS threads spending
most of their time in sched_yield() system call, making the kernel do a
lot of unnecessary work, and set the environment variable
OPENBLAS_NUM_THREADS=1 to use only one thread instead.

On the other hand, if the computation is purely single-threaded (or you
disabled the multi-threaded behaviour of OpenMP or BLAS for some
reason), you can spawn 20 workers on each of the 10 hosts:

makePSOCKcluster(names = rep(c('Host01', ..., 'Host10'), each = 20))

You can also try to combine the two approaches by limiting the number
of working threads to a sensible value which results in the threads
spending most of the time computing things (instead of waiting for more
work busy-looping on sched_yield()), then spawning as many processes as
required to utilise all of the cores.

-- 
Best regards,
Ivan



More information about the R-help mailing list