[R-pkg-devel] Possible false negative for compiled C++ code in CRAN checks
Mauricio Vargas Sepulveda
m@@epu|ved@ @end|ng |rom m@||@utoronto@c@
Fri Nov 15 13:16:38 CET 2024
Dear R Developers,
I implemented Dr. Krylov's fix, and now redatam is back to CRAN (Point 1).
Dr. Eddelbuettel's suggestion about not fully mimicking CRAN gcc-san setup was also correct (Point 2).
Point 1:
- When the big endian issue was described here, I tried the s390x image from R-Hub, which does not work on my laptop. Then I went to a big endian server that we have at UofT and that reproduced the error!
Point 2:
- I was testing with docker.io/rocker/r-devel-san:latest, and I could not replicate the error.
- I ended up using the R-Hub image with https://github.com/r-hub/containers/pull/81#issuecomment-2478009166, and that replicated the error line by line.
Besides it, I implemented the same fix for the command line tool and the Python package. I also changed the example data by aggregating Uruguay's census to show some numbers. Just cutting the levels and show region/city is not informative. Being Redatam a closed format with no standard or specification, I ended up reading the census with my pkg, aggregating with dplyr, saving to CSV, and then converting back to Redatam with a point-and-click tool following an unofficial tutorial from You Tube. There is no easy way to aggregate and save in the same format as we do with SPSS.
I added Dr. Krylov to ctb for this very useful fix! Thanks a lot to Dr. Krylov and Dr. Eddelbuettel for the suggestions, I was going in circles being unable to replicate the issue and I now asked Dr. Ligges for the possibility to know more about the CRAN specific configuration to add it to R-Hub.
Best,
——
Mauricio "Pachá" Vargas Sepúlveda
PhD Student, Political Science
University of Toronto
________________________________________
From: Ivan Krylov <ikrylov using disroot.org>
Sent: November 14, 2024 4:50 PM
To: Mauricio Vargas Sepulveda <m.sepulveda using mail.utoronto.ca>
Cc: r-package-devel using r-project.org <r-package-devel using r-project.org>
Subject: Re: [R-pkg-devel] Possible false negative for compiled C++ code in CRAN checks
В Thu, 14 Nov 2024 16:24:16 +0000
Mauricio Vargas Sepulveda <m.sepulveda using mail.utoronto.ca> пишет:
> After enabling the SAN flags, I cannot reproduce the gcc-san error
> [2].
Can you use the rocker/r-devel-san container? It reproduces the problem
for me.
When reading galapagos/cg15.dic, FuzzyEntityParser::ParseEntities()
keeps advancing over the file and failing to parse a single entity
until it eventually calls stop() because it didn't find any entities.
In a non-sanitized build, it first succeeds at 0-based offset 1095. In
a sanitized build, it fails for all offsets. I think this is due to the
ordering of the byte reads:
https://github.com/pachadotdev/open-redatam/blob/bbb65242f1af5f601def1c0b971ed601d459b4f3/src/readers/ByteArrayReader.cpp#L176-L192
In C++, an operation like the following:
static_cast<uint16_t>(ReadByte()) << 8 |
static_cast<uint16_t>(ReadByte());
...depends on the order in which the compiler will choose to evaluate
the calls to static_cast<uint16_t>(ReadByte()), and this order is not
guaranteed to be left-to-right:
https://en.cppreference.com/w/cpp/language/eval_order
I edited all four byte-reading functions and replaced the one-statement
operations with separate statements for each of the byte reads:
--- redatam.orig/src/redatamlib/readers/ByteArrayReader.cpp 2024-11-09 02:12:17.000000000 +0000
+++ redatam.new/src/redatamlib/readers/ByteArrayReader.cpp 2024-11-14 21:25:54.000000000 +0000
@@ -175,23 +175,27 @@
}
uint16_t ByteArrayReader::ReadInt16LE() {
- return static_cast<uint16_t>(ReadByte()) |
- (static_cast<uint16_t>(ReadByte()) << 8);
+ uint16_t a = static_cast<uint16_t>(ReadByte());
+ uint16_t b = static_cast<uint16_t>(ReadByte()) << 8;
+ return a | b;
}
uint32_t ByteArrayReader::ReadInt32LE() {
- return static_cast<uint32_t>(ReadInt16LE()) |
- static_cast<uint32_t>(ReadInt16LE()) << 16;
+ uint32_t a = static_cast<uint32_t>(ReadInt16LE());
+ uint32_t b = static_cast<uint32_t>(ReadInt16LE()) << 16;
+ return a | b;
}
uint16_t ByteArrayReader::ReadInt16BE() {
- return (static_cast<uint16_t>(ReadByte()) << 8) |
- static_cast<uint16_t>(ReadByte());
+ uint16_t a= (static_cast<uint16_t>(ReadByte()) << 8);
+ uint16_t b= static_cast<uint16_t>(ReadByte());
+ return a| b;
}
uint32_t ByteArrayReader::ReadInt32BE() {
- return (static_cast<uint32_t>(ReadInt16BE()) << 16) |
- static_cast<uint32_t>(ReadInt16BE());
+ uint32_t b = static_cast<uint32_t>(ReadInt16LE()) << 16;
+ uint32_t a = static_cast<uint32_t>(ReadInt16LE());
+ return b | a;
}
} // namespace RedatamLib
...and this seems to make the error vanish. I think I see the
misordering too. In the output of objdump -d
redatam.Rcheck/redatam/libs/redatam.so, I see:
0000000000267010 <_ZN10RedatamLib15ByteArrayReader11ReadInt16LEEv>:
267028: e8 93 6f f5 ff call 1bdfc0 <_ZN10RedatamLib15ByteArrayReader8ReadByteEv using plt>
first_byte <- ReadByte()
26702d: 41 89 c4 mov %eax,%r12d
save first byte in r12
26703d: 41 c1 e4 08 shl $0x8,%r12d
left-shift the first byte!
267041: e8 7a 6f f5 ff call 1bdfc0 <_ZN10RedatamLib15ByteArrayReader8ReadByteEv using plt>
second byte in eax
26704a: 44 09 e0 or %r12d,%eax
OR them together
...which is how you read big-endian numbers, not little-endian ones.
--
Best regards,
Ivan
More information about the R-package-devel
mailing list