[Rd] [External] Re: changes in R-devel and zero-extent objects in Rcpp

iuke-tier@ey m@iii@g oii uiow@@edu iuke-tier@ey m@iii@g oii uiow@@edu
Sun Jun 9 03:27:41 CEST 2024


On Sat, 8 Jun 2024, Ben Bolker wrote:

>  The ASAN errors occur *even if the zero-length object is not actually 
> accessed*/is used in a perfectly correct manner, i.e. it's perfectly legal in 
> base R to define `m <- numeric(0)` or `m <- matrix(nrow = 0, ncol = 0)`, 
> whereas doing the equivalent in Rcpp will (now) lead to an ASAN error.
>
>  i.e., these are *not* previously cryptic out-of-bounds accesses that are 
> now being revealed, but instead sensible and previously legal definitions of 
> zero-length objects that are now causing problems.
>
>   I'm pretty sure I'm right about this, but it's absolutely possible that 
> I'm just confused at this point; I don't have a super-simple example to show 
> you at the moment. The closest is this example by Mikael Jagan: 
> https://github.com/lme4/lme4/issues/794#issuecomment-2155093049
>
>  which shows that if x is a pointer to a zero-length vector (in plain C++ 
> for R, no Rcpp is involved), DATAPTR(x) and REAL(x) evaluate to different 
> values.
>
>  Mikael further points out that "Rcpp seems to cast a (void *) returned by 
> DATAPTR to (double *) when constructing a Vector<REALSXP> from a SEXP, rather 
> than using the (double *) returned by REAL." So perhaps R-core doesn't want 
> to guarantee that these operations give identical answers, in which case Rcpp 
> will have to change the way it does things ...

It looks like REAL and friends should also get this check, but it's
not high priority at this point, at least to me. DATAPTR has been
using this check for a while in a barrier build, so you might want to
test there as well. I expect we will activate more integrity checks
from the barrier build on the API client side as things are tidied up.

However: DATAPTR is not in the API and can't be at least in this form:
It allows access to a writable pointer to STRSXP and VECSXP data and
that is too dangerous for memory manager integrity. I'm not sure
exactly how this will be resolve, but be prepared for changes.

Best,

luke

>
>  cheers
>   Ben
>
>
>
> On 2024-06-08 6:39 p.m., Kevin Ushey wrote:
>> IMHO, this should be changed in both Rcpp and downstream packages:
>> 
>> 1. Rcpp could check for out-of-bounds accesses in cases like these, and 
>> emit an R warning / error when such an access is detected;
>> 
>> 2. The downstream packages unintentionally making these out-of-bounds 
>> accesses should be fixed to avoid doing that.
>> 
>> That is, I think this is ultimately a bug in the affected packages, but 
>> Rcpp could do better in detecting and handling this for client packages 
>> (avoiding a segfault).
>> 
>> Best,
>> Kevin
>> 
>> 
>> On Sat, Jun 8, 2024, 3:06 PM Ben Bolker <bbolker using gmail.com 
>> <mailto:bbolker using gmail.com>> wrote:
>> 
>>
>>          A change to R-devel (SVN r86629 or
>>     https://github.com/r-devel/r-svn/commit/92c1d5de23c93576f55062e26d446feface07250 
>> <https://github.com/r-devel/r-svn/commit/92c1d5de23c93576f55062e26d446feface07250>
>>     has changed the handling of pointers to zero-length objects, leading to
>>     ASAN issues with a number of Rcpp-based packages (the commit message
>>     reads, in part, "Also define STRICT_TYPECHECK when compiling
>>     inlined.c.")
>>
>>         I'm interested in discussion from the community.
>>
>>         Details/diagnosis for the issues in the lme4 package are here:
>>     https://github.com/lme4/lme4/issues/794
>>     <https://github.com/lme4/lme4/issues/794>, 
>> with a bit more discussion
>>     about how zero-length objects should be handled.
>>
>>         The short(ish) version is that r86629 enables the
>>     CATCH_ZERO_LENGTH_ACCESS definition. This turns on the CHKZLN macro
>>     <https://github.com/r-devel/r-svn/blob/4ef83b9dc3c6874e774195d329cbb6c11a71c414/src/main/memory.c#L4090-L4104 
>> <https://github.com/r-devel/r-svn/blob/4ef83b9dc3c6874e774195d329cbb6c11a71c414/src/main/memory.c#L4090-L4104>>,
>>     which returns a trivial pointer (rather than the data pointer that
>>     would
>>     be returned in the normal control flow) if an object has length 0:
>>
>>     /* Attempts to read or write elements of a zero length vector will
>>          result in a segfault, rather than read and write random memory.
>>          Returning NULL would be more natural, but Matrix seems to assume
>>          that even zero-length vectors have non-NULL data pointers, so
>>          return (void *) 1 instead. Zero-length CHARSXP objects still have 
>> a
>>          trailing zero byte so they are not handled. */
>>
>>         In the Rcpp context this leads to an inconsistency, where `REAL(x)`
>>     is a 'real' external pointer and `DATAPTR(x)` is 0x1, which in turn
>>     leads to ASAN warnings like
>>
>>     runtime error: reference binding to misaligned address 0x000000000001
>>     for type 'const double', which requires 8 byte alignment
>>     0x000000000001: note: pointer points here
>>
>>          I'm in over my head and hoping for insight into whether this
>>     problem
>>     should be resolved by changing R, Rcpp, or downstream Rcpp packages ...
>>
>>         cheers
>>          Ben Bolker
>>
>>     ______________________________________________
>>     R-devel using r-project.org <mailto:R-devel using r-project.org> mailing list
>>     https://stat.ethz.ch/mailman/listinfo/r-devel
>>     <https://stat.ethz.ch/mailman/listinfo/r-devel>
>> 
>
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu/


More information about the R-devel mailing list