[Rd] [External] Re: changes in R-devel and zero-extent objects in Rcpp

Hiroaki Yutani yut@n|@|n| @end|ng |rom gm@||@com
Sun Jun 9 12:10:05 CEST 2024


(Mainly reply to self)

I found there's a new API VECTOR_PTR_RO. Thanks for adding this!

https://github.com/r-devel/r-svn/commit/d499fab95b1ba23ee7842293030d4af1e69ae0fe

Best,
Yutani

2024年6月9日(日) 14:13 Hiroaki Yutani <yutani.ini using gmail.com>:

> Sorry to ask about a bit drifted topic, but will there be an alternative
> API to DATAPTR?
>
> > DATAPTR is not in the API and can't be at least in this form
>
> I believe it's vital for ALTREP to return the pointer to the expanded
> version of a SEXP just like the implementation in base R does [1].
> At least, VECSXP has no other measure to expose the pointer if I
> understand correctly.
>
> Best,
> Yutani
>
> [1]:
> https://github.com/r-devel/r-svn/blob/a3508b75d28164b0e5bcb2c87f816ce5169729a4/src/main/altclasses.c#L186
>
>
> 2024年6月9日(日) 10:43 luke-tierney--- via R-devel <r-devel using r-project.org>:
>
>> On Sat, 8 Jun 2024, Ben Bolker wrote:
>>
>> >  The ASAN errors occur *even if the zero-length object is not actually
>> > accessed*/is used in a perfectly correct manner, i.e. it's perfectly
>> legal in
>> > base R to define `m <- numeric(0)` or `m <- matrix(nrow = 0, ncol =
>> 0)`,
>> > whereas doing the equivalent in Rcpp will (now) lead to an ASAN error.
>> >
>> >  i.e., these are *not* previously cryptic out-of-bounds accesses that
>> are
>> > now being revealed, but instead sensible and previously legal
>> definitions of
>> > zero-length objects that are now causing problems.
>> >
>> >   I'm pretty sure I'm right about this, but it's absolutely possible
>> that
>> > I'm just confused at this point; I don't have a super-simple example to
>> show
>> > you at the moment. The closest is this example by Mikael Jagan:
>> > https://github.com/lme4/lme4/issues/794#issuecomment-2155093049
>> >
>> >  which shows that if x is a pointer to a zero-length vector (in plain
>> C++
>> > for R, no Rcpp is involved), DATAPTR(x) and REAL(x) evaluate to
>> different
>> > values.
>> >
>> >  Mikael further points out that "Rcpp seems to cast a (void *) returned
>> by
>> > DATAPTR to (double *) when constructing a Vector<REALSXP> from a SEXP,
>> rather
>> > than using the (double *) returned by REAL." So perhaps R-core doesn't
>> want
>> > to guarantee that these operations give identical answers, in which
>> case Rcpp
>> > will have to change the way it does things ...
>>
>> It looks like REAL and friends should also get this check, but it's
>> not high priority at this point, at least to me. DATAPTR has been
>> using this check for a while in a barrier build, so you might want to
>> test there as well. I expect we will activate more integrity checks
>> from the barrier build on the API client side as things are tidied up.
>>
>> However: DATAPTR is not in the API and can't be at least in this form:
>> It allows access to a writable pointer to STRSXP and VECSXP data and
>> that is too dangerous for memory manager integrity. I'm not sure
>> exactly how this will be resolve, but be prepared for changes.
>>
>> Best,
>>
>> luke
>>
>> >
>> >  cheers
>> >   Ben
>> >
>> >
>> >
>> > On 2024-06-08 6:39 p.m., Kevin Ushey wrote:
>> >> IMHO, this should be changed in both Rcpp and downstream packages:
>> >>
>> >> 1. Rcpp could check for out-of-bounds accesses in cases like these,
>> and
>> >> emit an R warning / error when such an access is detected;
>> >>
>> >> 2. The downstream packages unintentionally making these out-of-bounds
>> >> accesses should be fixed to avoid doing that.
>> >>
>> >> That is, I think this is ultimately a bug in the affected packages,
>> but
>> >> Rcpp could do better in detecting and handling this for client
>> packages
>> >> (avoiding a segfault).
>> >>
>> >> Best,
>> >> Kevin
>> >>
>> >>
>> >> On Sat, Jun 8, 2024, 3:06 PM Ben Bolker <bbolker using gmail.com
>> >> <mailto:bbolker using gmail.com>> wrote:
>> >>
>> >>
>> >>          A change to R-devel (SVN r86629 or
>> >>
>> https://github.com/r-devel/r-svn/commit/92c1d5de23c93576f55062e26d446feface07250
>> >> <
>> https://github.com/r-devel/r-svn/commit/92c1d5de23c93576f55062e26d446feface07250
>> >
>> >>     has changed the handling of pointers to zero-length objects,
>> leading to
>> >>     ASAN issues with a number of Rcpp-based packages (the commit
>> message
>> >>     reads, in part, "Also define STRICT_TYPECHECK when compiling
>> >>     inlined.c.")
>> >>
>> >>         I'm interested in discussion from the community.
>> >>
>> >>         Details/diagnosis for the issues in the lme4 package are here:
>> >>     https://github.com/lme4/lme4/issues/794
>> >>     <https://github.com/lme4/lme4/issues/794>,
>> >> with a bit more discussion
>> >>     about how zero-length objects should be handled.
>> >>
>> >>         The short(ish) version is that r86629 enables the
>> >>     CATCH_ZERO_LENGTH_ACCESS definition. This turns on the CHKZLN macro
>> >>     <
>> https://github.com/r-devel/r-svn/blob/4ef83b9dc3c6874e774195d329cbb6c11a71c414/src/main/memory.c#L4090-L4104
>> >> <
>> https://github.com/r-devel/r-svn/blob/4ef83b9dc3c6874e774195d329cbb6c11a71c414/src/main/memory.c#L4090-L4104
>> >>,
>> >>     which returns a trivial pointer (rather than the data pointer that
>> >>     would
>> >>     be returned in the normal control flow) if an object has length 0:
>> >>
>> >>     /* Attempts to read or write elements of a zero length vector will
>> >>          result in a segfault, rather than read and write random
>> memory.
>> >>          Returning NULL would be more natural, but Matrix seems to
>> assume
>> >>          that even zero-length vectors have non-NULL data pointers, so
>> >>          return (void *) 1 instead. Zero-length CHARSXP objects still
>> have
>> >> a
>> >>          trailing zero byte so they are not handled. */
>> >>
>> >>         In the Rcpp context this leads to an inconsistency, where
>> `REAL(x)`
>> >>     is a 'real' external pointer and `DATAPTR(x)` is 0x1, which in turn
>> >>     leads to ASAN warnings like
>> >>
>> >>     runtime error: reference binding to misaligned address
>> 0x000000000001
>> >>     for type 'const double', which requires 8 byte alignment
>> >>     0x000000000001: note: pointer points here
>> >>
>> >>          I'm in over my head and hoping for insight into whether this
>> >>     problem
>> >>     should be resolved by changing R, Rcpp, or downstream Rcpp
>> packages ...
>> >>
>> >>         cheers
>> >>          Ben Bolker
>> >>
>> >>     ______________________________________________
>> >>     R-devel using r-project.org <mailto:R-devel using r-project.org> mailing list
>> >>     https://stat.ethz.ch/mailman/listinfo/r-devel
>> >>     <https://stat.ethz.ch/mailman/listinfo/r-devel>
>> >>
>> >
>> >
>>
>> --
>> Luke Tierney
>> Ralph E. Wareham Professor of Mathematical Sciences
>> University of Iowa                  Phone:             319-335-3386
>> Department of Statistics and        Fax:               319-335-3017
>>     Actuarial Science
>> 241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
>> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu/
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list