[Rd] Use of C++ in Packages

Tomas Kalibera tom@@@k@||ber@ @end|ng |rom gm@||@com
Thu Apr 25 11:58:21 CEST 2019


On 4/24/19 6:41 PM, Hugh Marera wrote:
> Some of us are learning about development in R and use R in our work 
> data analysis pipelines. What is the best way to identify packages 
> that currently have these C++ problems? I would like to be able to 
> help fix the bugs but more importantly not use these packages in 
> critical work pipelines. Any C++ R package bug squashing events out there?

I think the best way available now is manual inspection/review of the 
source code of the packages you are using for your critical work. Such 
review should cover more than just dangerous use of C++ - a lot of 
problems exist also in plain C code (using unexported API from R, 
violating value semantics of R, other kinds of PROTECT errors, memory 
leaks due to long jumps, etc). The review could be limited to the 
context of your pipeline, on how the package is used there and whether 
you have a reliable external process for validating the results.

Out of the problems I've mentioned in my blog, the worst for normal use 
of packages is probably a PROTECT error on the fast path due to 
allocation in a destructor or other function run automatically. Various 
memory leaks or correctness problems on error paths (long jumps) may not 
be a complete showstopper if you restart R often and if you have a 
reliable way of validating results, but such issues would still make it 
much harder to diagnose problems.

The simple steps may include looking at CRAN check results, if there 
were any errors, warnings, notes, reports from analyzers (valgrind, 
asan, ubsan, rchk). The analyzers _may_ be able to spot a PROTECT error 
due to allocation in a destructor if one is lucky (in the case I 
mentioned in the blog, there was an ASAN report), but I think manual 
inspection is needed, and it can also reveal other problems.

Tomas

>
> Regards
>
> Hugh
>
> On Mon, Apr 1, 2019 at 6:23 PM Tomas Kalibera 
> <tomas.kalibera using gmail.com <mailto:tomas.kalibera using gmail.com>> wrote:
>
>     On 3/30/19 8:59 AM, Romain Francois wrote:
>     > tl;dr: we need better C++ tools and documentation.
>     >
>     > We collectively know more now with the rise of tools like rchk
>     and improved documentation such as Tomas’s post. That’s a start,
>     but it appears that there still is a lot of knowledge that would
>     deserve to be promoted to actual documentation of best practices.
>     Well there is quite a bit of knowledge in Writing R Extensions and
>     many
>     problems could have been prevented had it been read more
>     thoroughly by
>     package developers. The problem that C++ runs some functions
>     automatically (like destructors), should not be too hard to identify
>     based on what WRE says about the need for protection against garbage
>     collection.
>
>      From my experience, one can learn most about R internals from
>     debugging
>     and reading source code - when debugging PROTECT errors and other
>     memory
>     errors/memory corruption, common problems caused by bugs in native
>     C/C++
>     code - one needs to read and understand source code involved at all
>     layers, one needs to understand the documentation covering code at
>     different layers, and one has to think about these things, forming
>     hypotheses, narrowing down to smaller examples, etc.
>
>     My suggestion for package authors who write native code and want to
>     learn more, and who want to be responsible (these kinds of bugs
>     affect
>     other packaged indirectly and can be woken up by inconsequential and
>     correct code changes, even in R runtime): test and debug your code
>     hard
>     - look at UBSAN/ASAN/valgrind/rchk checks from CRAN and run these
>     tools
>     yourself if needed. Run with strict barrier checking and with
>     gctorture.
>     Write more tests to increase the coverage. Specifically now if you
>     use
>     C++ code, try to read all of your related code and check you do
>     not have
>     the problems I mentioned in my blog. Think of other related
>     problems and
>     if you find about them, tell others. Make sure you only use the
>     API from
>     Writing R Extensions (and R help system). If you really can't find
>     anything wrong about your package, but still want to learn more,
>     try to
>     debug some bugs reported against R runtime or against your favorite
>     packages you use (or their CRAN check reports from various tools). In
>     addition to learning more about R internals, by spending much more
>     time
>     on debugging you may also get a different perspective on some of the
>     things about C++ I pointed to. Finally, it would help us with the
>     problem we have now - that many R packages in C++ have serious bugs.
>
>     Tomas
>
>     ______________________________________________
>     R-devel using r-project.org <mailto:R-devel using r-project.org> mailing list
>     https://stat.ethz.ch/mailman/listinfo/r-devel
>


	[[alternative HTML version deleted]]



More information about the R-devel mailing list