[Rd] changes in R-devel and zero-extent objects in Rcpp
Mikael Jagan
j@g@nmn2 @end|ng |rom gm@||@com
Mon Jun 10 16:12:52 CEST 2024
> Date: Sat, 8 Jun 2024 19:16:22 -0400
> From: Ben Bolker <bbolker using gmail.com>
>
> The ASAN errors occur *even if the zero-length object is not actually
> accessed*/is used in a perfectly correct manner, i.e. it's perfectly
> legal in base R to define `m <- numeric(0)` or `m <- matrix(nrow = 0,
> ncol = 0)`, whereas doing the equivalent in Rcpp will (now) lead to an
> ASAN error.
>
> i.e., these are *not* previously cryptic out-of-bounds accesses that
> are now being revealed, but instead sensible and previously legal
> definitions of zero-length objects that are now causing problems.
>
The ASan output is:
> reference binding to misaligned address 0x000000000001 for type 'const
double', which requires 8 byte alignment
That there is a "reference" to 0x1 means that there really _is_ an attempt to
access memory there. The stack trace provided by ASan tells you exactly where
it happens: line 100 of
RcppEigen/inst/include/Eigen/src/Core/products/GeneralMatrixMatrixTriangular.h:
for(Index k2=0; k2<depth; k2+=kc)
{
const Index actual_kc = (std::min)(k2+kc,depth)-k2;
// note that the actual rhs is the transpose/adjoint of mat
pack_rhs(blockB, rhs.getSubMapper(k2,0), actual_kc, size);
^^^^^^^^^^^^^^^^^^^^^^
where 'rhs' is an object wrapping the pointer with a method getSubMapper(i, j)
for accessing the data like a matrix. In the first loop iteration, you access
rhs[0]; there is no defensive test for 'rhs' of positive length.
So ASan _is_ revealing an illegal access, complaining only now (since r86629)
because _now_ the address that you access illegally is misaligned.
This really should be avoided in lme4 and ideally reported to Eigen maintainers
if not already fixed there.
Mikael
> I'm pretty sure I'm right about this, but it's absolutely possible
> that I'm just confused at this point; I don't have a super-simple
> example to show you at the moment. The closest is this example by Mikael
> Jagan: https://github.com/lme4/lme4/issues/794#issuecomment-2155093049
>
> which shows that if x is a pointer to a zero-length vector (in plain
> C++ for R, no Rcpp is involved), DATAPTR(x) and REAL(x) evaluate to
> different values.
>
> Mikael further points out that "Rcpp seems to cast a (void *)
> returned by DATAPTR to (double *) when constructing a Vector<REALSXP>
> from a SEXP, rather than using the (double *) returned by REAL." So
> perhaps R-core doesn't want to guarantee that these operations give
> identical answers, in which case Rcpp will have to change the way it
> does things ...
>
> cheers
> Ben
>
>
>
> On 2024-06-08 6:39 p.m., Kevin Ushey wrote:
> > IMHO, this should be changed in both Rcpp and downstream packages:
> >
> > 1. Rcpp could check for out-of-bounds accesses in cases like these, and
> > emit an R warning / error when such an access is detected;
> >
> > 2. The downstream packages unintentionally making these out-of-bounds
> > accesses should be fixed to avoid doing that.
> >
> > That is, I think this is ultimately a bug in the affected packages, but
> > Rcpp could do better in detecting and handling this for client packages
> > (avoiding a segfault).
> >
> > Best,
> > Kevin
> >
> >
> > On Sat, Jun 8, 2024, 3:06 PM Ben Bolker <bbolker using gmail.com
> > <mailto:bbolker using gmail.com>> wrote:
> >
> >
> > A change to R-devel (SVN r86629 or
> >
https://github.com/r-devel/r-svn/commit/92c1d5de23c93576f55062e26d446feface07250
<https://github.com/r-devel/r-svn/commit/92c1d5de23c93576f55062e26d446feface07250>
> > has changed the handling of pointers to zero-length objects,
leading to
> > ASAN issues with a number of Rcpp-based packages (the commit
message
> > reads, in part, "Also define STRICT_TYPECHECK when compiling
> > inlined.c.")
> >
> > I'm interested in discussion from the community.
> >
> > Details/diagnosis for the issues in the lme4 package are here:
> > https://github.com/lme4/lme4/issues/794
> > <https://github.com/lme4/lme4/issues/794>, with a bit more
discussion
> > about how zero-length objects should be handled.
> >
> > The short(ish) version is that r86629 enables the
> > CATCH_ZERO_LENGTH_ACCESS definition. This turns on the CHKZLN macro
> >
<https://github.com/r-devel/r-svn/blob/4ef83b9dc3c6874e774195d329cbb6c11a71c414/src/main/memory.c#L4090-L4104
<https://github.com/r-devel/r-svn/blob/4ef83b9dc3c6874e774195d329cbb6c11a71c414/src/main/memory.c#L4090-L4104>>,
> > which returns a trivial pointer (rather than the data pointer that
> > would
> > be returned in the normal control flow) if an object has length 0:
> >
> > /* Attempts to read or write elements of a zero length vector will
> > result in a segfault, rather than read and write random
memory.
> > Returning NULL would be more natural, but Matrix seems to
assume
> > that even zero-length vectors have non-NULL data pointers, so
> > return (void *) 1 instead. Zero-length CHARSXP objects
still have a
> > trailing zero byte so they are not handled. */
> >
> > In the Rcpp context this leads to an inconsistency, where
`REAL(x)`
> > is a 'real' external pointer and `DATAPTR(x)` is 0x1, which in turn
> > leads to ASAN warnings like
> >
> > runtime error: reference binding to misaligned address
0x000000000001
> > for type 'const double', which requires 8 byte alignment
> > 0x000000000001: note: pointer points here
> >
> > I'm in over my head and hoping for insight into whether this
> > problem
> > should be resolved by changing R, Rcpp, or downstream Rcpp
packages ...
> >
> > cheers
> > Ben Bolker
More information about the R-devel
mailing list