[Rd] Difficult debug

Ivan Krylov |kry|ov @end|ng |rom d|@root@org
Wed Feb 7 22:30:38 CET 2024


On Wed, 07 Feb 2024 14:01:44 -0600
"Therneau, Terry M., Ph.D. via R-devel" <r-devel using r-project.org> wrote:

>  > test2 <- mysurv(fit2, pbc2$bili4, p0= 4:0/10, fit2, x0 =50)  
> ==31730== Invalid read of size 8
> ==31730==    at 0x298A07: Rf_allocVector3 (memory.c:2861)
> ==31730==    by 0x299B2C: Rf_allocVector (Rinlinedfuns.h:595)
> ==31730==    by 0x299B2C: R_alloc (memory.c:2330)
> ==31730==    by 0x3243C6: do_which (summary.c:1152)
<...>
> ==31730==  Address 0x10 is not stack'd, malloc'd or (recently) free'd
<...>
>   *** caught segfault ***
> address 0x10, cause 'memory not mapped'

An unrelated allocation function suddenly dereferencing a null pointer
is likely indication of heap corruption. Valgrind may be silent about
it because the C heap (that it knows how to override and track) is still
intact, but the R memory management metadata got corrupted (which looks
like a valid memory access to Valgrind).

An easy solution could be brought by more instrumentation.

R can tell Valgrind to consider some memory accesses invalid if you
configure it using --with-valgrind-instrumentation [*], but I'm not
sure it will be able to trap overwriting GC metadata, so let's set it
aside for now.

If you compile your own R, you can configure it with -fsanitize=address
added to the compiler and linker flags [**]. I'm not sure whether the
bounds checks performed by AddressSanitizer would be sufficient to
catch the problem, but it's worth a try. Instead of compiling R with
sanitizers, it should be also possible to use the container image
docker.io/rocker/r-devel-san.

The hard option is left if no instrumentation lets you pinpoint the
error. Since the first (as far as Valgrind is concerned) memory error
already happens to result in a SIGSEGV, you can run R in a regular
debugger and try to work backwards from the local variables at the
location of the crash. Maybe there's a way to identify the block
containing the pointer that gets overwritten and set a watchpoint on
it for the next run of R. Maybe you can read the overwritten value as
double and guess where the number came from. If your processor is
sufficiently new, you can try `rr`, the time-travelling debugger [***],
to rewind the process execution back to the point where the pointer gets
overwritten.

-- 
Best regards,
Ivan

[*]
https://cran.r-project.org/doc/manuals/R-exts.html#Using-valgrind

[**]
https://cran.r-project.org/doc/manuals/R-exts.html#Using-Address-Sanitizer

[***]
https://rr-project.org
Judging by the domain name, it's practically designed to fix troublesome
bugs in R packages!



More information about the R-devel mailing list