[Rd] R should add an API routine for safe use of memcpy(), memset() for use with 0-length SEXP
Michael Chirico
m|ch@e|ch|r|co4 @end|ng |rom gm@||@com
Thu Apr 24 18:35:32 CEST 2025
Is it correct to say the desired behavior in the long term is for such
an error to occur?
If so, could we make it so the --enable-strict-barrier (or other
option) behavior educes those errors for developers to opt in to
finding them sooner than later?
On Wed, Apr 23, 2025 at 3:43 PM Tomas Kalibera <tomas.kalibera using gmail.com> wrote:
>
>
> On 4/24/25 00:18, Michael Chirico wrote:
> > In that case it seems like just erroring instead of returning invalid
> > pointers is a much friendlier option. Why give developers an unpinned
> > grenade to carry around?
>
> That would be too strict at this point. There is too much code around
> depending on that holding on to an invalid pointer (but not
> dereferencing it) is ok, and it is currently still working on the
> current platforms where R is used.
>
> The mechanism of returning an invalid pointer from functions like
> INTEGER() called on an empty vector (sometimes this is called
> "poisoning") is much more precise. When this causes a crash, it is
> because the program actually used the pointer, which is a clear error
> (and it is a clear error regardless of the platform). There are no
> false alarms: when package authors have to debug a package because of
> this, there is really an error in the code to be fixed. And, for a large
> number of CRAN packages where a recent improvement in the poisoning
> showed up more cases, the debugging was done by the CRAN team for the
> package authors, in some cases even providing patches.
>
> Best
> Tomas
>
> >
> > On Wed, Apr 23, 2025 at 1:38 PM Tomas Kalibera <tomas.kalibera using gmail.com> wrote:
> >> On 4/23/25 19:03, Michael Chirico wrote:
> >>> h/t Tim Taylor for pointing out my blindspot :)
> >>>
> >>> We have Memcpy() in API already [1], which wraps a 0-aware R_chk_memcpy() [2].
> >>>
> >>> We don't quite have Memset() in API, though; instead we have Memzero()
> >>> [3] for R_chk_memset(s, 0, n) which is 0-aware memset() [4].
> >> I don't think that using wrappers for this is the right approach. Even
> >> loading an invalid pointer is undefined behavior. While it wouldn't
> >> matter on typical hardware where R is used today, it could cause a crash
> >> on some hardware, and it may well be that various checkers will start
> >> warning about such things. Also, holding (intentionally) invalid
> >> pointers makes debugging harder.
> >>
> >> At least in new code, I would avoid holding invalid pointers. Certainly
> >> I don't think we should be adding wrappers to R for C functions to make
> >> them work with invalid pointers.
> >>
> >> Best
> >> Tomas
> >>
> >>> [1] https://github.com/r-devel/r-svn/blob/9e597ca8132e8b6298b0bedb39fa8deba5c819df/src/include/R_ext/RS.h#L59-L60
> >>> [2] https://github.com/r-devel/r-svn/blob/9e597ca8132e8b6298b0bedb39fa8deba5c819df/src/main/memory.c#L3575-L3580
> >>> [3] https://github.com/r-devel/r-svn/blob/9e597ca8132e8b6298b0bedb39fa8deba5c819df/src/include/R_ext/RS.h#L62-L63
> >>> [4] https://github.com/r-devel/r-svn/blob/9e597ca8132e8b6298b0bedb39fa8deba5c819df/src/main/memory.c#L3582-L3587
> >>>
> >>> Mike C
> >>>
> >>> On Wed, Apr 23, 2025 at 8:57 AM Michael Chirico
> >>> <michaelchirico4 using gmail.com> wrote:
> >>>> From R 4.5.0 [1], all builds of R discourage use of INTEGER() [and
> >>>> friends REAL(), ... and *_RO() equivalents] on length-0 SEXP [2].
> >>>> Before R 4.5.0, this was the behavior under --enable-strict-barrier.
> >>>>
> >>>> That means the following can segfault under strict builds (e.g.
> >>>> -fsanitize=alignment and -O0):
> >>>>
> >>>> SEXP x = PROTECT(Rf_allocVector(INTSXP, 0));
> >>>> SEXP y = PROTECT(Rf_allocVector(INTSXP, 0));
> >>>> const int *x = INTEGER_RO(x); // invalid!
> >>>> int *y = INTEGER(y);
> >>>> memcpy(y, x, 0); // alluring, but undefined behavior!
> >>>>
> >>>> There are a number of CRAN packages that fall victim to this, see e.g.
> >>>> this PR and others linked to it [3]. I'm sure there are dozens if not
> >>>> hundreds of other equivalent bugs waiting to be discovered that just
> >>>> aren't covered by existing tests.
> >>>>
> >>>> {rlang} took the approach to define r_memcpy() and r_memset() which
> >>>> wrap memcpy() and memset(), resp., with an added length-0 check [4]; I
> >>>> think R itself should offer these (probably more consistently styled
> >>>> as R_Memcpy() and R_Memset()).
> >>>>
> >>>> (NB there's a possibility I'm still not fully grasping what's going on here :) )
> >>>>
> >>>> Mike C
> >>>>
> >>>> [1] related: https://stat.ethz.ch/pipermail/r-devel/2024-June/083456.html
> >>>> [2] https://github.com/r-devel/r-svn/blob/2b29e52e1c4e3d26b649cb7ac320b8a3dd13de30/src/main/memory.c#L4146
> >>>> [3] https://github.com/r-lib/vctrs/pull/1968
> >>>> [4] https://github.com/r-lib/rlang/pull/1797
> >>> ______________________________________________
> >>> R-devel using r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list