[Bioc-devel] eval() and evalq() of BiocGenerics used in Rcpp causes segfault in some cases

Martin Morgan m@rtin@morg@n @ending from ro@wellp@rk@org
Thu Jun 7 12:36:33 CEST 2018



On 06/07/2018 04:56 AM, Joris Meys wrote:
> Dear all,
> 
> the following issue drew my attention. The new package "conflicted"
> manipulates the search path and by doing so, highlighted that Rcpp is using
> the BiocGenerics version of evalq() in case BiocGenerics is loaded.
> Otherwise it uses the base version.
> 
> This is easily fixed in Rcpp by using base::evalq() - ( which will require
> all packages based on Rcpp to be recompiled. )
> 
> I wondered whether there is code that expects Rcpp to use the dispatching
> of the BiocGenerics version of eval() and that would fail after the fix is
> applied. Honestly, imho it shouldn't, but better safe than sorry. If you
> have pointers, I can check the patch in these contexts and hopefully
> prevent possible problems before they arise.

The issues with Rcpp are a little like a package that didn't 'import' a 
symbol it wanted, and so gets stuck using whatever is on the search 
path. Rcpp wanted base::evalq, as well as base::tryCatch and 
base::conditionMessage, etc. The fact that there's a BiocGenerics::evalq 
is irrelevant to the internal workings of Rcpp, just as it is irrelevant 
to the internal workings of AnnotationDbi that there's a dplyr::select.

Lionel's comment 
https://github.com/RcppCore/Rcpp/issues/861#issuecomment-395360615 
provides a caveat, but to me (as a non-Rcpp user) the direct use of 
Rcpp_eval() sounds very much like messing with it's internal code; a 
package wanting to use BiocGenerics::evalq() at the C++ level should 
specify that, rather than relying on Rcpp's evalq. I don't actually 
know, though, whether there are code paths from the public interface 
that lead through Rcpp_eval().

I'll also say that (a) the bug was real and unrelated to conflicted or 
BiocGenerics::evalq, and that (b) somewhat ironically it was exactly the 
sort of bug `conflicted` is designed to identify (use of unscoped 
variables) though in a surprising way, in that the segfault when using 
conflicted with GEOquery (actually, xml2) was the result of Rcpp 
mis-handling an error when evalq failed due to conflicted, rather than 
BiocGenerics::evalq somehow causing problems or conflicted being broken 
or ....

Secondarily, once the Rcpp bug was tackled, it turns out that conflicted 
finds an additional conflict, this time with BiocGenerics::sort / 
base::sort. And again the problem is quite interesting. In it's C++ 
code, dplyr wants to use base::sort(), but instead asks for sort() on 
the search() path, where conflicted reports that there are two 
definitions. So again conflicted is identifying the problem that it is 
meant to identify, but with the interesting features that (a) dplyr's 
C++ code is not scoped properly, it should have specified base::sort() 
or evaluated the call to sort in the dplyr environment; this is really 
interesting, and shows how C / C++ code can easily escape the namespace 
checks that R is able to provide at the package level. (b) 
BiocGenerics::sort() and BiocGenerics generics in general (including 
evalq) 'do the right thing' and pass dispatch to the symbol that they 
are making generics. So the presence of both BiocGenerics::sort and 
base::sort on the search path is a false positive, and is unlike the 
conflict between AnnotationDbi::select and dplyr::select which do not 
share a common base function.

Finally, reading through the thread on the conflicted bug report 
https://github.com/r-lib/conflicted/issues/8 (which is very interesting) 
shows that conflicted will actually get smarter, so correctly 
implemented S4 generics, like those in BiocGenerics, will not be flagged 
as false positives.

Thanks for bringing this up Joris, I'd encourage Bioconductor Rcpp 
developers to update their Rcpp to the version on github (the patch has 
been ported to master), REINSTALL ALL PACKAGES USING RCPP (painful), and 
ensuring that there are no problems, e.g., R CMD build && R CMD check. 
I'd also encourage these developers to use conflicted and other 
approaches to identify problematic C++ code of the sort found in dplyr 
-- use of unscoped function calls.

Martin

> 
> Reference to the issue :
> https://github.com/RcppCore/Rcpp/issues/861#issuecomment-395199660
> 
> Cheers
> 


This email message may contain legally privileged and/or...{{dropped:2}}



More information about the Bioc-devel mailing list