[Rd] Another issue using multi-processing linear algebra libraries

Thu Aug 8 12:43:08 CEST 2024

В Wed, 7 Aug 2024 07:47:38 -0400
Dipterix Wang <dipterix.wang using gmail.com> пишет:

> I wonder if R initiates a system environment or options to instruct
> the packages on the number of cores to use?

A lot of thought and experience with various HPC systems went into
availableCores(), a function from the zero-dependency 'parallelly'
package by Henrik Bengtsson:
https://search.r-project.org/CRAN/refmans/parallelly/html/availableCores.html
If you cannot accept a pre-created cluster object or 'future' plan or
'BiocParallel' parameters or the number of OpenMP threads from the
user, this must be a safer default than parallel::detectCores().

Building such a limiter into R poses a number of problems. Here is a
summary from a previous discussion on R-pkg-devel [1] with wise
contributions from Dirk Eddelbuettel, Reed A. Cartwright, Vladimir
Dergachev, and Andrew Robbins.

 - R is responsible for the BLAS it is linked to and therefore must
   actively manage the BLAS threads when the user sets the thread
   limit. This requires writing BLAS-specific code to talk to the
   libraries, like done in FlexiBLAS and the RhpcBLASctl package. Some
   BLASes (like ATLAS) only have a compile-time thread limit. R should
   somehow give all threads to BLAS by default but take them away when
   some other form of parallelism is requested.

 - Should R be managing the OpenMP thread limit by itself? If not,
   that's a lot of extra work for every OpenMP-using package developer.
   If yes, R is now responsible for initialising OpenMP.

 - Managing the BLAS and OpenMP thread limits is already a hard problem
   because some BLASes may or may not be following the OpenMP thread
   limits.

 - What if two packages both consult the thread limit and create N^2
   processes as a result of one calling the other? Dividing a single
   computer between BLAS threads, OpenMP threads, child processes and
   their threads needs a very reliable global inter-process semaphore.
   R would have to grow a jobserver like in GNU Make, a separate
   process because the main R thread will be blocked waiting for the
   computation result, especially if we want to automatically recover
   job slots from crashed processes. That's probably not impossible,
   but involves a lot of OS-specific code.

 - What happens with the thread limit when starting remote R processes?
   It's best to avoid having to set it manually. If multiple people
   unknowingly start R on a shared server, how to avoid the R instances
   competing for the CPU (or the ownership of the semaphore)?

 - It will take a lot of political power to actually make this scheme
   work. The limiter can only be cooperative (unless you override the
   clone() syscall and make it fail? I expect everything to crash after
   that), so it takes one piece of software to unknowingly ignore the
   limit and break everything.

-- 
Best regards,
Ivan

[1] https://stat.ethz.ch/pipermail/r-package-devel/2023q4/009956.html