[Bioc-devel] [External] Re: Strange "internal logical NA value has been modified" error

Pariksheet Nanda p@r|k@heet@n@nd@ @end|ng |rom uconn@edu
Thu Oct 14 02:20:53 CEST 2021


Cheers, Luke!

Indeed, the bug could not be caught both using the address sanitizer 
approach pointed out by Henrik nor valgrind in this particular case.

But adding the watchpoint to R_LogicalNAValue worked!

I now have the call stack of the offending functions sample() and 
simulate().  For anyone else who finds the need to do this, do *not* 
specify the SEXP datatype when setting your watchpoint as suggested here 
on StackOverflow (https://stackoverflow.com/a/31202563).  My guess is 
specifying the datatype perhaps causes swapping between the addresses 
spanned by the large SEXP object to the very limited (typically ~8 
registers) for trapping the memory write.  I mention this "wrong way" in 
my steps  below.  To Martin's point, a bit of work was needed to reduce 
to a reliable minimum test case which was turned out to be:

   devtools::install()
   ### In the package "./tests/" directory containing "testthat.R"
   ### run a new R session e.g. `R -f minimum_script.R` with the lines:
   library(tsshmm)
   testthat::test_dir("testthat", package = "tsshmm",
                      filter = "train|tss")

Thank you all for your invaluable help,
Pariksheet


Steps to get the call stack:

1) Compile R from source and invoke the debugger:

    R --version | head -1
    # R version 4.1.1 (2021-08-10) -- "Kick Things"
    cd /tmp
    wget https://cloud.r-project.org/src/base/R-4/R-4.1.1.tar.gz
    tar -xf R-4.1.1.tar.gz
    mkdir build-R
    cd build-R/
    ../R-4.1.1/configure  --silent --without-x CFLAGS="-g -O2 -pipe"
    make -j  # -j with C++ will murder your RAM, but C is frugal.
    ./bin/R --debugger=gdb


2) Now that you're in gdb, here's how you set the watchpoint and get the 
back trace:

    (gdb) break memory.c:Rf_InitMemory

    # Breakpoint 1 at 0x177710: file ../../../R-4.1.1/src/main/memory.c, 
line 2136.

    (gdb) run

    # ...
    # Breakpoint 1, Rf_InitMemory () at 
../../../R-4.1.1/src/main/memory.c:2136
    # 2136	    init_gctorture();

    (gdb) clear memory.c:Rf_InitMemory

    # Deleted breakpoint 1

    (gdb) print R_LogicalNAValue

    # $1 = (SEXP) 0x0

    (gdb) watch R_LogicalNAValue

    # Hardware watchpoint 2: R_LogicalNAValue

    (gdb) continue

    # Hardware watchpoint 2: R_LogicalNAValue
    #
    # Old value = (SEXP) 0x0
    # New value = (SEXP) 0x555555a18140
    # Rf_InitMemory () at ../../../R-4.1.1/src/main/memory.c:2227
    # 2227	    LOGICAL(R_LogicalNAValue)[0] = NA_LOGICAL;

    (gdb) info watchpoints

    # Num     Type           Disp Enb Address            What
    # 2       hw watchpoint  keep y                      R_LogicalNAValue
    #         breakpoint already hit 1 time

    (gdb) delete 2

    (gdb) info watchpoints

    # No watchpoints.

    (gdb) print R_LogicalNAValue

    # $2 = (SEXP) 0x555555a18140

    (gdb) watch *(SEXP) 0x555555a18140 # <- *** THIS IS VERY SLOW ***

    Watchpoint 3: *(SEXP) 0x555555a18140

    (gdb) delete 3

    (gdb) watch * 0x555555a18140 # <- *** Better ***

    # Watchpoint 4: * 0x555555a18140

    (gdb) disable

    (gdb) continue
    ...
    > setwd("~/src/tsshmm/tests/")
    > library(tsshmm)
    ...
    > ^C

    # Program received signal SIGINT, Interrupt.
    # 0x00007ffff7038ff7 in __GI___select (nfds=nfds using entry=1, 
readfds=readfds using entry=0x5555559f7780 <readMask.14775>, 
writefds=writefds using entry=0x0, exceptfds=exceptfds using entry=0x0,
    #     timeout=timeout using entry=0x0) at 
../sysdeps/unix/sysv/linux/select.c:41
    # 41	../sysdeps/unix/sysv/linux/select.c: No such file or directory.

    (gdb) enable

    (gdb) continue

    > testthat::test_dir("testthat", package = "tsshmm", filter = 
"train|tss")

    # ...
    # ✔ | F W S  OK | Context
    # ⠏ |         0 | train
    # Thread 1 "R" hit Hardware watchpoint 4: * 0x555555a18140

    # Old value = 822083626
    # New value = 805306410
    # RunGenCollect (size_needed=<optimized out>) at 
../../../R-4.1.1/src/main/memory.c:1687
    # 1687			s = next;

    (gdb) backtrace

    #0  RunGenCollect (size_needed=<optimized out>) at 
../../../R-4.1.1/src/main/memory.c:1687
    #1  R_gc_internal (size_needed=<optimized out>) at 
../../../R-4.1.1/src/main/memory.c:3129
    #2  0x00005555556cab98 in Rf_allocVector3 (type=<optimized out>, 
type using entry=13, length=length using entry=626, allocator=allocator using entry=0x0) 
at ../../../R-4.1.1/src/main/memory.c:2775
    #3  0x00005555555c14e0 in Rf_allocVector (length=626, type=13) at 
../../../R-4.1.1/src/include/Rinlinedfuns.h:595
    #4  PutRNGstate () at ../../../R-4.1.1/src/main/RNG.c:444
    #5  0x00007ffff44cac5e in sample (model=model using entry=0x555557da5fa0, 
n=3, from=from using entry=0, f=f using entry=0x7ffff44cabf0 <prob_emis>) at 
simulate.c:23
    #6  0x00007ffff44cad8b in simulate (model=0x555557da5fa0, 
obs=<optimized out>, ncol=ncol using entry=100, nrow=nrow using entry=10000) at 
simulate.c:101
    #7  0x00007ffff44ca651 in C_simulate (obs=0x7fffed59c010, 
dim=<optimized out>, trans=<optimized out>, emis=<optimized out>, 
emis_tied=<optimized out>, start=<optimized out>)
        at R_wrap_tsshmm.c:98
    ...


On 10/13/21 8:27 AM, luke-tierney using uiowa.edu wrote:
> *Message sent from a system outside of UConn.*
> 
> 
> The most likely culprit is C code that is modifying a logical vector
> without checking whether this is legitimate for R semantics
> (i.e. making sure MAYBE_REFERENCED or at least MAYBE_SHARED is FALSE).
> If that is the case, then this is legitimate for C code to do in
> principle, so UBSAN and valgrind won't help. You need to set a gdb
> watchpoint on the location, catch where it is modified, and look up
> the call stack from there.
> 
> The error signaled in the GC is a sanity check for catching that this
> sort of misbehavior has happened in C code. But it is a check after
> the fact; it can't tell you more that that the problem happened
> sometime before it was detected.
> 
> Best,
> 
> luke
> 
> On Wed, 13 Oct 2021, Martin Morgan wrote:
> 
>> The problem with using gdb is you'd find yourself in the garbage 
>> collector, but perhaps quite removed from where the corruption 
>> occurred, e.g., gc() might / will likely be triggered after you've 
>> returned to the top-level evaluation loop, and the part of your code 
>> that did the corruption might be off the stack.
>>
>> The problem with devtools::check() (and R CMD check) is that running 
>> the unit tests occurs in a separate process, so things like setting a 
>> global option (and even system variable from within R) may not be 
>> visible in the process doing the check. Conversely, for the same 
>> reasons, it seems like the problem can be tickled by running the tests 
>> alone. So
>>
>>  R -f <your_source>/tests/testthat.R
>>
>> would seem to be a good enough starting point.
>>
>> Actually, I liked Henrik's UBSAN suggestion, which requires the least 
>> amount of work. I think I'd then try
>>
>>  R -d valgrind -f <your_source>/tests/testthat.R
>>
>> and then further into the weeds... actually from the section of R-exts 
>> you mention
>>
>>  R_C_BOUNDS_CHECK=yes R -f <your_source>/tests/testthat.R
>>
>> might also be promising.
>>
>> Martin
>>
>> On 10/12/21, 10:30 PM, "Bioc-devel on behalf of Pariksheet Nanda" 
>> <bioc-devel-bounces using r-project.org on behalf of 
>> pariksheet.nanda using uconn.edu> wrote:
>>
>>    Hi all,
>>
>>    On 10/12/21 6:43 PM, Pariksheet Nanda wrote:
>>    >
>>    > Error in `...`: internal logical NA value has been modified
>>
>>    In the R source code, this error is in src/main/memory.c so I was
>>    thinking one way of investigating might be to run `R --debugger gdb`,
>>    then running R to load the symbols and either:
>>
>>    1) set a breakpoint for when it reaches that particular line in
>>    memory.c:R_gc_internal and then walk up the stack,
>>
>>    2) or set a watch point on memory.c:R_gc_internal:R_LogicalNAValue
>>    (somehow; having trouble getting gdb to reach that context).
>>
>>    3) Then I thought, maybe this is getting far into the weeds and 
>> instead
>>    I could check the most common C related error by enabling bounds
>>    checking of my C arrays per section 4.4 of the R-exts manual:
>>
>>    $ R -q
>>     > options(CBoundsCheck = TRUE)
>>     > Sys.setenv(R_C_BOUNDS_CHECK = "yes") # Try both ways *shrug*
>>     > devtools::test()
>>    ... # All tests still pass.
>>     > devtools::check()
>>    ... # No change :(
>>
>>    Maybe I'm not sure I'm using that option correctly?  Or the option is
>>    ignored in devtools::check().  Or indeed, the error is not from over
>>    running C array boundaries.
>>
>>    It turns out that using the precompiled debug symbols[1] isn't all 
>> that
>>    useful here because I don't get line numbers in gdb without the source
>>    files and many symbols are optimized out, so it looks like I would 
>> need
>>    to compile R from source with -ggdb first instead of using the Debian
>>    packages.
>>
>>    Hopefully this is still the right approach?
>>
>>    Pariksheet
>>
>>
>>    [1] After install r-base-core-dbg on Debian for the debug symbols.
>>
>>    _______________________________________________
>>    Bioc-devel using r-project.org mailing list
>>    https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> _______________________________________________
>> Bioc-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 
> -- 
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa                  Phone:             319-335-3386
> Department of Statistics and        Fax:               319-335-3017
>     Actuarial Science
> 241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu



More information about the Bioc-devel mailing list