[Rd] changes in R-devel and zero-extent objects in Rcpp

Ben Bolker bbo|ker @end|ng |rom gm@||@com
Sun Jun 9 01:16:22 CEST 2024


   The ASAN errors occur *even if the zero-length object is not actually 
accessed*/is used in a perfectly correct manner, i.e. it's perfectly 
legal in base R to define `m <- numeric(0)` or `m <- matrix(nrow = 0, 
ncol = 0)`, whereas doing the equivalent in Rcpp will (now) lead to an 
ASAN error.

   i.e., these are *not* previously cryptic out-of-bounds accesses that 
are now being revealed, but instead sensible and previously legal 
definitions of zero-length objects that are now causing problems.

    I'm pretty sure I'm right about this, but it's absolutely possible 
that I'm just confused at this point; I don't have a super-simple 
example to show you at the moment. The closest is this example by Mikael 
Jagan: https://github.com/lme4/lme4/issues/794#issuecomment-2155093049

   which shows that if x is a pointer to a zero-length vector (in plain 
C++ for R, no Rcpp is involved), DATAPTR(x) and REAL(x) evaluate to 
different values.

   Mikael further points out that "Rcpp seems to cast a (void *) 
returned by DATAPTR to (double *) when constructing a Vector<REALSXP> 
from a SEXP, rather than using the (double *) returned by REAL." So 
perhaps R-core doesn't want to guarantee that these operations give 
identical answers, in which case Rcpp will have to change the way it 
does things ...

   cheers
    Ben



On 2024-06-08 6:39 p.m., Kevin Ushey wrote:
> IMHO, this should be changed in both Rcpp and downstream packages:
> 
> 1. Rcpp could check for out-of-bounds accesses in cases like these, and 
> emit an R warning / error when such an access is detected;
> 
> 2. The downstream packages unintentionally making these out-of-bounds 
> accesses should be fixed to avoid doing that.
> 
> That is, I think this is ultimately a bug in the affected packages, but 
> Rcpp could do better in detecting and handling this for client packages 
> (avoiding a segfault).
> 
> Best,
> Kevin
> 
> 
> On Sat, Jun 8, 2024, 3:06 PM Ben Bolker <bbolker using gmail.com 
> <mailto:bbolker using gmail.com>> wrote:
> 
> 
>          A change to R-devel (SVN r86629 or
>     https://github.com/r-devel/r-svn/commit/92c1d5de23c93576f55062e26d446feface07250 <https://github.com/r-devel/r-svn/commit/92c1d5de23c93576f55062e26d446feface07250>
>     has changed the handling of pointers to zero-length objects, leading to
>     ASAN issues with a number of Rcpp-based packages (the commit message
>     reads, in part, "Also define STRICT_TYPECHECK when compiling
>     inlined.c.")
> 
>         I'm interested in discussion from the community.
> 
>         Details/diagnosis for the issues in the lme4 package are here:
>     https://github.com/lme4/lme4/issues/794
>     <https://github.com/lme4/lme4/issues/794>, with a bit more discussion
>     about how zero-length objects should be handled.
> 
>         The short(ish) version is that r86629 enables the
>     CATCH_ZERO_LENGTH_ACCESS definition. This turns on the CHKZLN macro
>     <https://github.com/r-devel/r-svn/blob/4ef83b9dc3c6874e774195d329cbb6c11a71c414/src/main/memory.c#L4090-L4104 <https://github.com/r-devel/r-svn/blob/4ef83b9dc3c6874e774195d329cbb6c11a71c414/src/main/memory.c#L4090-L4104>>,
>     which returns a trivial pointer (rather than the data pointer that
>     would
>     be returned in the normal control flow) if an object has length 0:
> 
>     /* Attempts to read or write elements of a zero length vector will
>          result in a segfault, rather than read and write random memory.
>          Returning NULL would be more natural, but Matrix seems to assume
>          that even zero-length vectors have non-NULL data pointers, so
>          return (void *) 1 instead. Zero-length CHARSXP objects still have a
>          trailing zero byte so they are not handled. */
> 
>         In the Rcpp context this leads to an inconsistency, where `REAL(x)`
>     is a 'real' external pointer and `DATAPTR(x)` is 0x1, which in turn
>     leads to ASAN warnings like
> 
>     runtime error: reference binding to misaligned address 0x000000000001
>     for type 'const double', which requires 8 byte alignment
>     0x000000000001: note: pointer points here
> 
>          I'm in over my head and hoping for insight into whether this
>     problem
>     should be resolved by changing R, Rcpp, or downstream Rcpp packages ...
> 
>         cheers
>          Ben Bolker
> 
>     ______________________________________________
>     R-devel using r-project.org <mailto:R-devel using r-project.org> mailing list
>     https://stat.ethz.ch/mailman/listinfo/r-devel
>     <https://stat.ethz.ch/mailman/listinfo/r-devel>
> 

-- 
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
(Acting) Graduate chair, Mathematics & Statistics
 > E-mail is sent at my convenience; I don't expect replies outside of 
working hours.



More information about the R-devel mailing list