[R-pkg-devel] How to identify what flags "Additional Issues" for UBSAN

Ivan Krylov kry|ov@r00t @end|ng |rom gm@||@com
Thu Mar 23 12:06:17 CET 2023


В Wed, 22 Mar 2023 15:51:54 +0000
"Kenny, Christopher" <christopherkenny using fas.harvard.edu> пишет:

> Is there an easy way to identify what is causing the flag?

Unfortunately, it's _all_ the sanitizer errors.

> wilson.cpp:165:34: runtime error: signed integer overflow:
> -2147483648 * -1 cannot be represented in type 'int'

> However, my understanding is that these errors should be expected, as
> the input is controlled from within the package and checking for
> these types of errors in every loop would push against the purpose of
> using C++ for performance here.

You'd be right to think that signed integer overflow just wraps around
on modern CPUs with no adverse effects on the rest of the execution.
Unfortunately, you'd also need to convince the C++ optimiser, and it's
currently allowed to think otherwise.

In C++, signed integer overflow (and other similar errors, such as
casing NaN to an integer) is undefined behaviour, which, according to
the standard, means that anything can happen after that, ranging from
nothing out of order to a crash and also to silent corruption of
important research results. Other languages define integer overflow to
have a more limited impact (wrap the value around or at least guarantee
a crash), but not C and C++. [*]

Thankfully, I only see one spot where you encounter UB, in
src/wilson.cpp line 165, which should be relatively easy to fix by
adjusting your strategy for calculating the maximum number of tries.
(Do you get a NaN when `remaining` is -1? Why is it -1? Or is it 0?)

-- 
Best regards,
Ivan

[*]
Some old gcc version used to launch the game Nethack when certain kinds
of UB were encountered. The situation has improved in some ways since
that and worsened in others: modern C and C++ compilers, especially
Clang, use UB to guide their optimisation efforts, so if the optimiser
can prove "if A is true then UB happens", it optimises with the
assumption that A is false.

Combined with optimizer bugs, this understanding of undefined
behaviour may lead to funny situations where the inability of the
compiler to prove certain facts about the program leads to
mathematically inconsistent results:
https://blog.regehr.org/archives/140



More information about the R-package-devel mailing list