[R-pkg-devel] Too many cores used in examples (not caused by data.table)

Ivan Krylov kry|ov@r00t @end|ng |rom gm@||@com
Tue Oct 24 14:55:16 CEST 2023


В Tue, 24 Oct 2023 10:37:48 +0000
"Helske, Jouni" <jouni.helske using jyu.fi> пишет:

> Examples with CPU time > 2.5 times elapsed time
>           user system elapsed ratio
> exchange 1.196   0.04   0.159 7.774

I've downloaded the archived copy of the package from the CRAN FTP
server, installed it and tried:

library(bssm)
Sys.setenv("OMP_THREAD_LIMIT" = 2)
data("exchange")
model <- svm(
 exchange, rho = uniform(0.97,-0.999,0.999),
 sd_ar = halfnormal(0.175, 2), mu = normal(-0.87, 0, 2)
)
system.time(particle_smoother(model, particles = 500))
#    user  system elapsed
#   0.515   0.000   0.073

I set a breakpoint on clone() [*] and got quite a few calls creating
OpenMP threads with the following call stack:

#0  clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:52
<...>
#4  0x00007ffff7314e0a in GOMP_parallel () from
/usr/lib/x86_64-linux-gnu/libgomp.so.1
 <-- RcppArmadillo code below
#5 0x00007ffff38f5f00 in
arma::eglue_core<arma::eglue_div>::apply<arma::Mat<double>,
arma::eOp<arma::eOp<arma::Col<double>, arma::eop_exp>,
arma::eop_scalar_times>, arma::eOp<arma::eOp<arma::Col<double>,
arma::eop_scalar_div_post>, arma::eop_square> > (outP=..., x=...) at
.../library/RcppArmadillo/include/armadillo_bits/mp_misc.hpp:69
#6 0x00007ffff3a31246 in
arma::Mat<double>::operator=<arma::eOp<arma::eOp<arma::Col<double>,
arma::eop_exp>, arma::eop_scalar_times>,
arma::eOp<arma::eOp<arma::Col<double>, arma::eop_scalar_div_post>,
arma::eop_square>, arma::eglue_div> (X=..., this=0x7fffffff36f0) at
.../library/RcppArmadillo/include/armadillo_bits/Proxy.hpp:226
#7
arma::Col<double>::operator=<arma::eGlue<arma::eOp<arma::eOp<arma::Col<double>,
arma::eop_exp>, arma::eop_scalar_times>,
arma::eOp<arma::eOp<arma::Col<double>, arma::eop_scalar_div_post>,
arma::eop_square>, arma::eglue_div> > ( X=..., this=0x7fffffff36f0) at
.../library/RcppArmadillo/include/armadillo_bits/Col_meat.hpp:535
 <-- bssm code below
#8  ssm_ung::laplace_iter (this=0x7fffffff15e0, signal=...) at
model_ssm_ung.cpp:310
#9  0x00007ffff3a36e9e in ssm_ung::approximate (this=0x7fffffff15e0) at
.../library/RcppArmadillo/include/armadillo_bits/arrayops_meat.hpp:27
#10 0x00007ffff3a3b3d3 in ssm_ung::psi_filter
(this=this using entry=0x7fffffff15e0, nsim=nsim using entry=500, alpha=...,
weights=..., indices=...) at model_ssm_ung.cpp:517
#11 0x00007ffff3948cd7 in psi_smoother (model_=..., nsim=nsim using entry=500,
seed=seed using entry=1092825895, model_type=model_type using entry=3) at
R_psi.cpp:131

What does arma::eglue_core do?

(gdb) list
/* reformatted a bit */
library/RcppArmadillo/include/armadillo_bits/mp_misc.hpp:64
 int n_threads = (std::min)(
  int(arma_config::mp_threads),
  int((std::max)(int(1), int(omp_get_max_threads())))
 );
(gdb) p arma_config::mp_threads
$3 = 8
(gdb) p (int)omp_get_max_threads()
$4 = 16
(gdb) p (char*)getenv("OMP_THREAD_LIMIT")
$7 = 0x555556576b91 "2"
(gdb) p /x (int)omp_get_thread_limit()
$9 = 0x7fffffff

Sorry for misinforming you about the OMP_THREAD_LIMIT environment
variable: the OpenMP specification requires the program to ignore
modifications to the environment variables after the program has
started [**], so it only works if R is started with OMP_THREAD_LIMIT
set. Additionally, the OpenMP thread limit is not supposed to be
adjusted at runtime at all [***].

Unfortunately for our situation, Armadillo is very insistent in setting
its own number of threads from arma_config::mp_threads (which is
constexpr 8 unless you set preprocessor directives while compiling it)
and omp_get_max_threads (which is the upper bound on the number of
threads that cannot be adjusted at runtime).

What I'm about to suggest is a terrible hack, but since Armadillo seems
to lack the option to set the number of threads at runtime, there might
be no other option.

Before you #include an Armadillo header, every time:

1. #include <omp.h> so that the OpenMP functions are declared and the
#include guard is set

2. Define a static inline function get_number_of_threads returning the
desired number of threads as an int (e.g. referencing an extern int
number_of_threads stored elsewhere)

3. #define omp_get_max_threads get_number_of_threads

Now if you provide an API for the R code to get and set this number, it
should be possible to control the number of threads used by OpenMP code
in Armadillo. Basically, a data.table::setDTthreads() for the copy of
Armadillo inlined inside your package.

If you then compile your package with a large #define
ARMA_OPENMP_THREADS, it will both be able to use more than 8 threads
*and* limit itself when needed.

An alternative course of action is compiling your package with #define
ARMA_OPENMP_THREADS 2 and giving up on more OpenMP threads inside calls
to Armadillo.

-- 
Best regards,
Ivan

[*]
https://github.com/tidymodels/textrecipes/pull/251#issuecomment-1775549814

[**]
https://www.openmp.org/spec-html/5.2/openmpch21.html#x432-59000021

[***]
https://www.openmp.org/wp-content/uploads/OpenMPRefCard-5-2-web.pdf#page=15




More information about the R-package-devel mailing list