[R-pkg-devel] Possible false negative for compiled C++ code in CRAN checks

Thu Nov 14 22:50:43 CET 2024

В Thu, 14 Nov 2024 16:24:16 +0000
Mauricio Vargas Sepulveda <m.sepulveda using mail.utoronto.ca> пишет:

> After enabling the SAN flags, I cannot reproduce the gcc-san error
> [2].

Can you use the rocker/r-devel-san container? It reproduces the problem
for me.

When reading galapagos/cg15.dic, FuzzyEntityParser::ParseEntities()
keeps advancing over the file and failing to parse a single entity
until it eventually calls stop() because it didn't find any entities.

In a non-sanitized build, it first succeeds at 0-based offset 1095. In
a sanitized build, it fails for all offsets. I think this is due to the
ordering of the byte reads:
https://github.com/pachadotdev/open-redatam/blob/bbb65242f1af5f601def1c0b971ed601d459b4f3/src/readers/ByteArrayReader.cpp#L176-L192

In C++, an operation like the following:

static_cast<uint16_t>(ReadByte()) << 8 |
static_cast<uint16_t>(ReadByte());

...depends on the order in which the compiler will choose to evaluate
the calls to static_cast<uint16_t>(ReadByte()), and this order is not
guaranteed to be left-to-right:
https://en.cppreference.com/w/cpp/language/eval_order

I edited all four byte-reading functions and replaced the one-statement
operations with separate statements for each of the byte reads:

--- redatam.orig/src/redatamlib/readers/ByteArrayReader.cpp     2024-11-09 02:12:17.000000000 +0000
+++ redatam.new/src/redatamlib/readers/ByteArrayReader.cpp      2024-11-14 21:25:54.000000000 +0000
@@ -175,23 +175,27 @@
 }

 uint16_t ByteArrayReader::ReadInt16LE() {
-  return static_cast<uint16_t>(ReadByte()) |
-         (static_cast<uint16_t>(ReadByte()) << 8);
+  uint16_t a = static_cast<uint16_t>(ReadByte());
+  uint16_t b = static_cast<uint16_t>(ReadByte()) << 8;
+  return a | b;
 }

 uint32_t ByteArrayReader::ReadInt32LE() {
-  return static_cast<uint32_t>(ReadInt16LE()) |
-         static_cast<uint32_t>(ReadInt16LE()) << 16;
+  uint32_t a = static_cast<uint32_t>(ReadInt16LE());
+  uint32_t b = static_cast<uint32_t>(ReadInt16LE()) << 16;
+  return a | b;
 }

 uint16_t ByteArrayReader::ReadInt16BE() {
-  return (static_cast<uint16_t>(ReadByte()) << 8) |
-         static_cast<uint16_t>(ReadByte());
+ uint16_t a= (static_cast<uint16_t>(ReadByte()) << 8);
+ uint16_t b= static_cast<uint16_t>(ReadByte());
+ return a| b;
 }

 uint32_t ByteArrayReader::ReadInt32BE() {
-  return (static_cast<uint32_t>(ReadInt16BE()) << 16) |
-         static_cast<uint32_t>(ReadInt16BE());
+  uint32_t b = static_cast<uint32_t>(ReadInt16LE()) << 16;
+  uint32_t a = static_cast<uint32_t>(ReadInt16LE());
+  return b | a;
 }

 }  // namespace RedatamLib

...and this seems to make the error vanish. I think I see the
misordering too. In the output of objdump -d
redatam.Rcheck/redatam/libs/redatam.so, I see:

0000000000267010 <_ZN10RedatamLib15ByteArrayReader11ReadInt16LEEv>:

  267028:       e8 93 6f f5 ff          call   1bdfc0 <_ZN10RedatamLib15ByteArrayReader8ReadByteEv using plt>
                                               first_byte <- ReadByte()
  26702d:       41 89 c4                mov    %eax,%r12d
                                               save first byte in r12
  26703d:       41 c1 e4 08             shl    $0x8,%r12d
                                               left-shift the first byte!
  267041:       e8 7a 6f f5 ff          call   1bdfc0 <_ZN10RedatamLib15ByteArrayReader8ReadByteEv using plt>
                                               second byte in eax
  26704a:       44 09 e0                or     %r12d,%eax
                                               OR them together

...which is how you read big-endian numbers, not little-endian ones.

-- 
Best regards,
Ivan