[R-pkg-devel] NOTE: Examples with CPU time > 2.5 times elapsed time
Ivan Krylov
kry|ov@r00t @end|ng |rom gm@||@com
Mon Dec 4 23:42:42 CET 2023
Dear Artur,
You've got a well-written package. There are some parts I wasn't able
to understand (e.g. changing the class of a variable by reference using
SET_CLASS and later changing it back), but there are no obvious places
where a mistake could be hiding.
On Fri, 1 Dec 2023 23:43:32 +0000
Artur Araujo <artur.stat using gmail.com> wrote:
> Examples with CPU time > 2.5 times elapsed time
> user system elapsed ratio
> contour.TPCmsm 2.605 0.132 0.108 25.343
I managed to reproduce this, but not before compiling R-devel r85646
on a Debian Testing virtual machine using clang-17 and flang-17 with
libomp-17-dev installed. In particular, this doesn't seem to happen
with latest R-devel built using (much older versions of) the GNU
toolchain.
There was a false start where I saw pthread_create breakpoints hit
inside libpango while contour() was running, but that couldn't be your
problem because examples run with the pdf() device active. Instead,
when source()ing the TPmsm-Ex.R file, the breakpoint fired elsewhere:
### Name: as.data.frame.survTP
### Title: as.data.frame method for a survTP object
### Aliases: as.data.frame.survTP
### Keywords: manip metho .... [TRUNCATED]
Breakpoint 7, __pthread_create_2_1 (newthread=0x7fffffff5a70, attr=0x7fffffff5a78, start_routine=0x7ffff75882a0 <__kmp_launch_worker()>, arg=0x5555572d9e80)
at ./nptl/pthread_create.c:623
(gdb) bt
#0 __pthread_create_2_1 (newthread=0x7fffffff5a70, attr=0x7fffffff5a78, start_routine=0x7ffff75882a0 <__kmp_launch_worker()>, arg=0x5555572d9e80)
at ./nptl/pthread_create.c:623
#1 0x00007ffff75879b4 in __kmp_create_worker () at build-llvm/tools/clang/stage2-bins/openmp/runtime/src/z_Linux_util.cpp:795
#2 0x00007ffff7518899 in __kmp_allocate_thread () at build-llvm/tools/clang/stage2-bins/openmp/runtime/src/kmp_runtime.cpp:4677
#3 0x00007ffff751078a in __kmp_allocate_team () at build-llvm/tools/clang/stage2-bins/openmp/runtime/src/kmp_runtime.cpp:5384
#4 0x00007ffff7511813 in __kmp_fork_call () at build-llvm/tools/clang/stage2-bins/openmp/runtime/src/kmp_runtime.cpp:2122
#5 0x00007ffff74ffc81 in __kmpc_fork_call () at build-llvm/tools/clang/stage2-bins/openmp/runtime/src/kmp_csupport.cpp:300
#6 0x00007ffff465100e in cens2 (pT1=<optimized out>, pE1=<optimized out>, pS=<optimized out>, pE=<optimized out>, pn=<optimized out>, tfunc=<optimized out>,
pcorr=<optimized out>, pdistpar=<optimized out>, cfunc=<optimized out>, pcenspar=<optimized out>, pstate2prob=<optimized out>) at dgpTP.c:83
#7 dgpTP (n=<optimized out>, corr=<optimized out>, dist=<optimized out>, distpar=<optimized out>, modelcens=<optimized out>, censpar=<optimized out>,
state2prob=<optimized out>) at dgpTP.c:178
#8 0x0000555555645cba in R_doDotCall (fun=0x0, nargs=nargs using entry=7, cargs=cargs using entry=0x7fffffff63b0, call=call using entry=0x5555573b62b8)
at ../../../R-svn/src/main/dotcode.c:1498
(gdb) frame 6
#6 0x00007ffff465100e in cens2 (pT1=<optimized out>, pE1=<optimized out>, pS=<optimized out>, pE=<optimized out>, pn=<optimized out>, tfunc=<optimized out>,
pcorr=<optimized out>, pdistpar=<optimized out>, cfunc=<optimized out>, pcenspar=<optimized out>, pstate2prob=<optimized out>) at dgpTP.c:83
83 #pragma omp parallel num_threads(global_num_threads)
(gdb) p global_num_threads
$4 = 8
Not sure why the CPU time gets counted for the contour() example
instead of as.data.frame.survTP - do libomp threads not get counted
until they terminate or something? - but since the contour.TPCmsm
example directly follows as.data.frame.survTP, the call to dgpTP here
must be the culprit.
I would like to suggest a new diagnostic for the CPU time / elapsed
time tests. If any examples have been found to exceed the time ratio
limit and R is running on Linux with a special environment variable
set, the examples should be re-run under a debugger. The debugger
should set a breakpoint on the clone _system call_ (ideally, all system
calls that can create threads or child processes) and print a C-level
backtrace when it fires. Ideally, it should call something like
R_ConciseTraceback to obtain an R-level traceback too, but it's not an
API (at least yet) and I'm not sure it's generally safe to do this in
the middle of an R call. If there is interest, I can try to see if gdb
can be scripted to make this happen. I think it must be a ptrace
debugger and not an LD_PRELOAD wrapper because there are too many ways
to start threads and processes. For example, with libomp,
pthread_create() breakpoints fire while clone() breakpoints do not,
despite something later does make the clone() system call.
This won't help if the threads get started separately and only later
get utilised in a different example, but I wasn't able to put the
output of `perf report` for good use when debugging similar problems in
a different package. Other ideas for tracing the CPU usage culprits are
also welcome.
--
Best regards,
Ivan
More information about the R-package-devel
mailing list