[Rd] changes in R-devel and zero-extent objects in Rcpp

Mikael Jagan j@g@nmn2 @end|ng |rom gm@||@com
Mon Jun 10 16:12:52 CEST 2024


     > Date: Sat, 8 Jun 2024 19:16:22 -0400
     > From: Ben Bolker <bbolker using gmail.com>
     >
     >     The ASAN errors occur *even if the zero-length object is not actually
     > accessed*/is used in a perfectly correct manner, i.e. it's perfectly
     > legal in base R to define `m <- numeric(0)` or `m <- matrix(nrow = 0,
     > ncol = 0)`, whereas doing the equivalent in Rcpp will (now) lead to an
     > ASAN error.
     >
     >     i.e., these are *not* previously cryptic out-of-bounds accesses that
     > are now being revealed, but instead sensible and previously legal
     > definitions of zero-length objects that are now causing problems.
     >

The ASan output is:

     > reference binding to misaligned address 0x000000000001 for type 'const 
double', which requires 8 byte alignment

That there is a "reference" to 0x1 means that there really _is_ an attempt to
access memory there.  The stack trace provided by ASan tells you exactly where
it happens: line 100 of
RcppEigen/inst/include/Eigen/src/Core/products/GeneralMatrixMatrixTriangular.h:

     for(Index k2=0; k2<depth; k2+=kc)
     {
       const Index actual_kc = (std::min)(k2+kc,depth)-k2;


       // note that the actual rhs is the transpose/adjoint of mat
       pack_rhs(blockB, rhs.getSubMapper(k2,0), actual_kc, size);
                        ^^^^^^^^^^^^^^^^^^^^^^

where 'rhs' is an object wrapping the pointer with a method getSubMapper(i, j)
for accessing the data like a matrix.  In the first loop iteration, you access
rhs[0]; there is no defensive test for 'rhs' of positive length.

So ASan _is_ revealing an illegal access, complaining only now (since r86629)
because _now_ the address that you access illegally is misaligned.

This really should be avoided in lme4 and ideally reported to Eigen maintainers
if not already fixed there.

Mikael

     >      I'm pretty sure I'm right about this, but it's absolutely possible
     > that I'm just confused at this point; I don't have a super-simple
     > example to show you at the moment. The closest is this example by Mikael
     > Jagan: https://github.com/lme4/lme4/issues/794#issuecomment-2155093049
     >
     >     which shows that if x is a pointer to a zero-length vector (in plain
     > C++ for R, no Rcpp is involved), DATAPTR(x) and REAL(x) evaluate to
     > different values.
     >
     >     Mikael further points out that "Rcpp seems to cast a (void *)
     > returned by DATAPTR to (double *) when constructing a Vector<REALSXP>
     > from a SEXP, rather than using the (double *) returned by REAL." So
     > perhaps R-core doesn't want to guarantee that these operations give
     > identical answers, in which case Rcpp will have to change the way it
     > does things ...
     >
     >     cheers
     >      Ben
     >
     >
     >
     > On 2024-06-08 6:39 p.m., Kevin Ushey wrote:
     >    > IMHO, this should be changed in both Rcpp and downstream packages:
     >    >
     >    > 1. Rcpp could check for out-of-bounds accesses in cases like these, and
     >    > emit an R warning / error when such an access is detected;
     >    >
     >    > 2. The downstream packages unintentionally making these out-of-bounds
     >    > accesses should be fixed to avoid doing that.
     >    >
     >    > That is, I think this is ultimately a bug in the affected packages, but
     >    > Rcpp could do better in detecting and handling this for client packages
     >    > (avoiding a segfault).
     >    >
     >    > Best,
     >    > Kevin
     >    >
     >    >
     >    > On Sat, Jun 8, 2024, 3:06 PM Ben Bolker <bbolker using gmail.com
     >    > <mailto:bbolker using gmail.com>> wrote:
     >    >
     >    >
     >    >           A change to R-devel (SVN r86629 or
     >    > 
https://github.com/r-devel/r-svn/commit/92c1d5de23c93576f55062e26d446feface07250 
<https://github.com/r-devel/r-svn/commit/92c1d5de23c93576f55062e26d446feface07250>
     >    >      has changed the handling of pointers to zero-length objects, 
leading to
     >    >      ASAN issues with a number of Rcpp-based packages (the commit 
message
     >    >      reads, in part, "Also define STRICT_TYPECHECK when compiling
     >    >      inlined.c.")
     >    >
     >    >          I'm interested in discussion from the community.
     >    >
     >    >          Details/diagnosis for the issues in the lme4 package are here:
     >    >      https://github.com/lme4/lme4/issues/794
     >    >      <https://github.com/lme4/lme4/issues/794>, with a bit more 
discussion
     >    >      about how zero-length objects should be handled.
     >    >
     >    >          The short(ish) version is that r86629 enables the
     >    >      CATCH_ZERO_LENGTH_ACCESS definition. This turns on the CHKZLN macro
     >    > 
<https://github.com/r-devel/r-svn/blob/4ef83b9dc3c6874e774195d329cbb6c11a71c414/src/main/memory.c#L4090-L4104 
<https://github.com/r-devel/r-svn/blob/4ef83b9dc3c6874e774195d329cbb6c11a71c414/src/main/memory.c#L4090-L4104>>,
     >    >      which returns a trivial pointer (rather than the data pointer that
     >    >      would
     >    >      be returned in the normal control flow) if an object has length 0:
     >    >
     >    >      /* Attempts to read or write elements of a zero length vector will
     >    >           result in a segfault, rather than read and write random 
memory.
     >    >           Returning NULL would be more natural, but Matrix seems to 
assume
     >    >           that even zero-length vectors have non-NULL data pointers, so
     >    >           return (void *) 1 instead. Zero-length CHARSXP objects 
still have a
     >    >           trailing zero byte so they are not handled. */
     >    >
     >    >          In the Rcpp context this leads to an inconsistency, where 
`REAL(x)`
     >    >      is a 'real' external pointer and `DATAPTR(x)` is 0x1, which in turn
     >    >      leads to ASAN warnings like
     >    >
     >    >      runtime error: reference binding to misaligned address 
0x000000000001
     >    >      for type 'const double', which requires 8 byte alignment
     >    >      0x000000000001: note: pointer points here
     >    >
     >    >           I'm in over my head and hoping for insight into whether this
     >    >      problem
     >    >      should be resolved by changing R, Rcpp, or downstream Rcpp 
packages ...
     >    >
     >    >          cheers
     >    >           Ben Bolker



More information about the R-devel mailing list