[Rd] Another issue using multi-processing linear algebra libraries
Ivan Krylov
|kry|ov @end|ng |rom d|@root@org
Thu Aug 8 12:43:08 CEST 2024
В Wed, 7 Aug 2024 07:47:38 -0400
Dipterix Wang <dipterix.wang using gmail.com> пишет:
> I wonder if R initiates a system environment or options to instruct
> the packages on the number of cores to use?
A lot of thought and experience with various HPC systems went into
availableCores(), a function from the zero-dependency 'parallelly'
package by Henrik Bengtsson:
https://search.r-project.org/CRAN/refmans/parallelly/html/availableCores.html
If you cannot accept a pre-created cluster object or 'future' plan or
'BiocParallel' parameters or the number of OpenMP threads from the
user, this must be a safer default than parallel::detectCores().
Building such a limiter into R poses a number of problems. Here is a
summary from a previous discussion on R-pkg-devel [1] with wise
contributions from Dirk Eddelbuettel, Reed A. Cartwright, Vladimir
Dergachev, and Andrew Robbins.
- R is responsible for the BLAS it is linked to and therefore must
actively manage the BLAS threads when the user sets the thread
limit. This requires writing BLAS-specific code to talk to the
libraries, like done in FlexiBLAS and the RhpcBLASctl package. Some
BLASes (like ATLAS) only have a compile-time thread limit. R should
somehow give all threads to BLAS by default but take them away when
some other form of parallelism is requested.
- Should R be managing the OpenMP thread limit by itself? If not,
that's a lot of extra work for every OpenMP-using package developer.
If yes, R is now responsible for initialising OpenMP.
- Managing the BLAS and OpenMP thread limits is already a hard problem
because some BLASes may or may not be following the OpenMP thread
limits.
- What if two packages both consult the thread limit and create N^2
processes as a result of one calling the other? Dividing a single
computer between BLAS threads, OpenMP threads, child processes and
their threads needs a very reliable global inter-process semaphore.
R would have to grow a jobserver like in GNU Make, a separate
process because the main R thread will be blocked waiting for the
computation result, especially if we want to automatically recover
job slots from crashed processes. That's probably not impossible,
but involves a lot of OS-specific code.
- What happens with the thread limit when starting remote R processes?
It's best to avoid having to set it manually. If multiple people
unknowingly start R on a shared server, how to avoid the R instances
competing for the CPU (or the ownership of the semaphore)?
- It will take a lot of political power to actually make this scheme
work. The limiter can only be cooperative (unless you override the
clone() syscall and make it fail? I expect everything to crash after
that), so it takes one piece of software to unknowingly ignore the
limit and break everything.
--
Best regards,
Ivan
[1] https://stat.ethz.ch/pipermail/r-package-devel/2023q4/009956.html
More information about the R-devel
mailing list