[Rd] Another issue using multi-processing linear algebra libraries

Thu Aug 8 10:45:40 CEST 2024

> "There’s a downside not mentioned in the manual that caught and baffled me for a while. I was using all 64 cores of an AWS instance via parallel::mclapply() and doing matrix multiplications in the parallelized function. If the matrices were big enough the linked BLAS or LAPACK would try to use all 64 cores for each multiplication, which meant 64^2 processes or threads in some combination and that was the end of all useful work. I worked around the problem by rewriting the matrix multiply as “colSums(x * t(y))”. It also worked to build R from source, which I guess uses the built-in BLAS and LAPACK."

I believe one can control the number of BLAS threads via the `RhpcBLASctl` package: https://cran.r-project.org/package=RhpcBLASctl I’ve definitely used it in the other direction, when `betareg` was *not* multiprocessing. https://stackoverflow.com/a/66540693/570918

> "Does R build its own BLAS and LAPACK if it's also linking external ones?"

No, it will not. On Conda Forge, there was even some trickery on certain platforms (osx-arm64) where external BLAS/LAPACK were used, but symlinks were used to fill in the typical R delivered ones (Rblas.dylib, Rlapack.dylib) to allow previously built packages using rpath links to support the swap.

BTW, one can easily select the Conda Forge BLAS/LAPACK implementation. It doesn't provide the R-vendored ones, but the reference standard is Netlib, e.g., `conda install 'blas=*=netlib'`. But that's also the slowest by all metrics and on all platforms.

	[[alternative HTML version deleted]]