[R-pkg-devel] NOTE: Examples with CPU time > 2.5 times elapsed time

Ben Bolker bbo|ker @end|ng |rom gm@||@com
Tue Dec 5 00:23:01 CET 2023


Wow.

On Mon, Dec 4, 2023, 5:43 PM Ivan Krylov <krylov.r00t using gmail.com> wrote:

> Dear Artur,
>
> You've got a well-written package. There are some parts I wasn't able
> to understand (e.g. changing the class of a variable by reference using
> SET_CLASS and later changing it back), but there are no obvious places
> where a mistake could be hiding.
>
> On Fri, 1 Dec 2023 23:43:32 +0000
> Artur Araujo <artur.stat using gmail.com> wrote:
>
> > Examples with CPU time > 2.5 times elapsed time
> >                 user system elapsed  ratio
> > contour.TPCmsm 2.605  0.132   0.108 25.343
>
> I managed to reproduce this, but not before compiling R-devel r85646
> on a Debian Testing virtual machine using clang-17 and flang-17 with
> libomp-17-dev installed. In particular, this doesn't seem to happen
> with latest R-devel built using (much older versions of) the GNU
> toolchain.
>
> There was a false start where I saw pthread_create breakpoints hit
> inside libpango while contour() was running, but that couldn't be your
> problem because examples run with the pdf() device active. Instead,
> when source()ing the TPmsm-Ex.R file, the breakpoint fired elsewhere:
>
> ### Name: as.data.frame.survTP
> ### Title: as.data.frame method for a survTP object
> ### Aliases: as.data.frame.survTP
> ### Keywords: manip metho .... [TRUNCATED]
>
> Breakpoint 7, __pthread_create_2_1 (newthread=0x7fffffff5a70,
> attr=0x7fffffff5a78, start_routine=0x7ffff75882a0 <__kmp_launch_worker()>,
> arg=0x5555572d9e80)
>     at ./nptl/pthread_create.c:623
> (gdb) bt
> #0  __pthread_create_2_1 (newthread=0x7fffffff5a70, attr=0x7fffffff5a78,
> start_routine=0x7ffff75882a0 <__kmp_launch_worker()>, arg=0x5555572d9e80)
>     at ./nptl/pthread_create.c:623
> #1  0x00007ffff75879b4 in __kmp_create_worker () at
> build-llvm/tools/clang/stage2-bins/openmp/runtime/src/z_Linux_util.cpp:795
> #2  0x00007ffff7518899 in __kmp_allocate_thread () at
> build-llvm/tools/clang/stage2-bins/openmp/runtime/src/kmp_runtime.cpp:4677
> #3  0x00007ffff751078a in __kmp_allocate_team () at
> build-llvm/tools/clang/stage2-bins/openmp/runtime/src/kmp_runtime.cpp:5384
> #4  0x00007ffff7511813 in __kmp_fork_call () at
> build-llvm/tools/clang/stage2-bins/openmp/runtime/src/kmp_runtime.cpp:2122
> #5  0x00007ffff74ffc81 in __kmpc_fork_call () at
> build-llvm/tools/clang/stage2-bins/openmp/runtime/src/kmp_csupport.cpp:300
> #6  0x00007ffff465100e in cens2 (pT1=<optimized out>, pE1=<optimized out>,
> pS=<optimized out>, pE=<optimized out>, pn=<optimized out>,
> tfunc=<optimized out>,
>     pcorr=<optimized out>, pdistpar=<optimized out>, cfunc=<optimized
> out>, pcenspar=<optimized out>, pstate2prob=<optimized out>) at dgpTP.c:83
> #7  dgpTP (n=<optimized out>, corr=<optimized out>, dist=<optimized out>,
> distpar=<optimized out>, modelcens=<optimized out>, censpar=<optimized
> out>,
>     state2prob=<optimized out>) at dgpTP.c:178
> #8  0x0000555555645cba in R_doDotCall (fun=0x0, nargs=nargs using entry=7,
> cargs=cargs using entry=0x7fffffff63b0, call=call using entry=0x5555573b62b8)
>     at ../../../R-svn/src/main/dotcode.c:1498
> (gdb) frame 6
> #6  0x00007ffff465100e in cens2 (pT1=<optimized out>, pE1=<optimized out>,
> pS=<optimized out>, pE=<optimized out>, pn=<optimized out>,
> tfunc=<optimized out>,
>     pcorr=<optimized out>, pdistpar=<optimized out>, cfunc=<optimized
> out>, pcenspar=<optimized out>, pstate2prob=<optimized out>) at dgpTP.c:83
> 83              #pragma omp parallel num_threads(global_num_threads)
> (gdb) p global_num_threads
> $4 = 8
>
> Not sure why the CPU time gets counted for the contour() example
> instead of as.data.frame.survTP - do libomp threads not get counted
> until they terminate or something? - but since the contour.TPCmsm
> example directly follows as.data.frame.survTP, the call to dgpTP here
> must be the culprit.
>
> I would like to suggest a new diagnostic for the CPU time / elapsed
> time tests. If any examples have been found to exceed the time ratio
> limit and R is running on Linux with a special environment variable
> set, the examples should be re-run under a debugger. The debugger
> should set a breakpoint on the clone _system call_ (ideally, all system
> calls that can create threads or child processes) and print a C-level
> backtrace when it fires. Ideally, it should call something like
> R_ConciseTraceback to obtain an R-level traceback too, but it's not an
> API (at least yet) and I'm not sure it's generally safe to do this in
> the middle of an R call. If there is interest, I can try to see if gdb
> can be scripted to make this happen. I think it must be a ptrace
> debugger and not an LD_PRELOAD wrapper because there are too many ways
> to start threads and processes. For example, with libomp,
> pthread_create() breakpoints fire while clone() breakpoints do not,
> despite something later does make the clone() system call.
>
> This won't help if the threads get started separately and only later
> get utilised in a different example, but I wasn't able to put the
> output of `perf report` for good use when debugging similar problems in
> a different package. Other ideas for tracing the CPU usage culprits are
> also welcome.
>
> --
> Best regards,
> Ivan
>
> ______________________________________________
> R-package-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

	[[alternative HTML version deleted]]



More information about the R-package-devel mailing list