[Rd] R should add an API routine for safe use of memcpy(), memset() for use with 0-length SEXP
Tomas Kalibera
tom@@@k@||ber@ @end|ng |rom gm@||@com
Tue Apr 29 23:45:46 CEST 2025
On 4/24/25 18:35, Michael Chirico wrote:
> Is it correct to say the desired behavior in the long term is for such
> an error to occur?
I would say not necessarily, and certainly no such decision has been
made. Another option would be that it would eventually return a null
pointer (as a data pointer to an empty vector), and holding such pointer
is fine. One just cannot dereference it.
> If so, could we make it so the --enable-strict-barrier (or other
> option) behavior educes those errors for developers to opt in to
> finding them sooner than later?
As above - it may not become an error at all. Also - this would be hard
doing in isolation. Say that you as a package maintainer wanted to make
sure that you never grab a data pointer to an empty vector. So you
enable such a theoretical option. But your package has dependencies
which may not necessarily want to be that strict, and may not work with
such strict checking.
If you are decided you would never want to grab a data pointer to an
empty vector in your package, you could create custom wrappers of the
corresponding macros / functions that get the data pointer. But I
wouldn't do even that, I think that wrappers of the R API make the code
harder to read. One could still do it somehow only for testing. And
certainly one could make sure that package tests also use extreme
inputs, such as empty vectors. That would have the benefit of testing
also other code, not just non-referencing of such pointers.
Best
Tomas
>
> On Wed, Apr 23, 2025 at 3:43 PM Tomas Kalibera <tomas.kalibera using gmail.com> wrote:
>>
>> On 4/24/25 00:18, Michael Chirico wrote:
>>> In that case it seems like just erroring instead of returning invalid
>>> pointers is a much friendlier option. Why give developers an unpinned
>>> grenade to carry around?
>> That would be too strict at this point. There is too much code around
>> depending on that holding on to an invalid pointer (but not
>> dereferencing it) is ok, and it is currently still working on the
>> current platforms where R is used.
>>
>> The mechanism of returning an invalid pointer from functions like
>> INTEGER() called on an empty vector (sometimes this is called
>> "poisoning") is much more precise. When this causes a crash, it is
>> because the program actually used the pointer, which is a clear error
>> (and it is a clear error regardless of the platform). There are no
>> false alarms: when package authors have to debug a package because of
>> this, there is really an error in the code to be fixed. And, for a large
>> number of CRAN packages where a recent improvement in the poisoning
>> showed up more cases, the debugging was done by the CRAN team for the
>> package authors, in some cases even providing patches.
>>
>> Best
>> Tomas
>>
>>> On Wed, Apr 23, 2025 at 1:38 PM Tomas Kalibera <tomas.kalibera using gmail.com> wrote:
>>>> On 4/23/25 19:03, Michael Chirico wrote:
>>>>> h/t Tim Taylor for pointing out my blindspot :)
>>>>>
>>>>> We have Memcpy() in API already [1], which wraps a 0-aware R_chk_memcpy() [2].
>>>>>
>>>>> We don't quite have Memset() in API, though; instead we have Memzero()
>>>>> [3] for R_chk_memset(s, 0, n) which is 0-aware memset() [4].
>>>> I don't think that using wrappers for this is the right approach. Even
>>>> loading an invalid pointer is undefined behavior. While it wouldn't
>>>> matter on typical hardware where R is used today, it could cause a crash
>>>> on some hardware, and it may well be that various checkers will start
>>>> warning about such things. Also, holding (intentionally) invalid
>>>> pointers makes debugging harder.
>>>>
>>>> At least in new code, I would avoid holding invalid pointers. Certainly
>>>> I don't think we should be adding wrappers to R for C functions to make
>>>> them work with invalid pointers.
>>>>
>>>> Best
>>>> Tomas
>>>>
>>>>> [1] https://github.com/r-devel/r-svn/blob/9e597ca8132e8b6298b0bedb39fa8deba5c819df/src/include/R_ext/RS.h#L59-L60
>>>>> [2] https://github.com/r-devel/r-svn/blob/9e597ca8132e8b6298b0bedb39fa8deba5c819df/src/main/memory.c#L3575-L3580
>>>>> [3] https://github.com/r-devel/r-svn/blob/9e597ca8132e8b6298b0bedb39fa8deba5c819df/src/include/R_ext/RS.h#L62-L63
>>>>> [4] https://github.com/r-devel/r-svn/blob/9e597ca8132e8b6298b0bedb39fa8deba5c819df/src/main/memory.c#L3582-L3587
>>>>>
>>>>> Mike C
>>>>>
>>>>> On Wed, Apr 23, 2025 at 8:57 AM Michael Chirico
>>>>> <michaelchirico4 using gmail.com> wrote:
>>>>>> From R 4.5.0 [1], all builds of R discourage use of INTEGER() [and
>>>>>> friends REAL(), ... and *_RO() equivalents] on length-0 SEXP [2].
>>>>>> Before R 4.5.0, this was the behavior under --enable-strict-barrier.
>>>>>>
>>>>>> That means the following can segfault under strict builds (e.g.
>>>>>> -fsanitize=alignment and -O0):
>>>>>>
>>>>>> SEXP x = PROTECT(Rf_allocVector(INTSXP, 0));
>>>>>> SEXP y = PROTECT(Rf_allocVector(INTSXP, 0));
>>>>>> const int *x = INTEGER_RO(x); // invalid!
>>>>>> int *y = INTEGER(y);
>>>>>> memcpy(y, x, 0); // alluring, but undefined behavior!
>>>>>>
>>>>>> There are a number of CRAN packages that fall victim to this, see e.g.
>>>>>> this PR and others linked to it [3]. I'm sure there are dozens if not
>>>>>> hundreds of other equivalent bugs waiting to be discovered that just
>>>>>> aren't covered by existing tests.
>>>>>>
>>>>>> {rlang} took the approach to define r_memcpy() and r_memset() which
>>>>>> wrap memcpy() and memset(), resp., with an added length-0 check [4]; I
>>>>>> think R itself should offer these (probably more consistently styled
>>>>>> as R_Memcpy() and R_Memset()).
>>>>>>
>>>>>> (NB there's a possibility I'm still not fully grasping what's going on here :) )
>>>>>>
>>>>>> Mike C
>>>>>>
>>>>>> [1] related: https://stat.ethz.ch/pipermail/r-devel/2024-June/083456.html
>>>>>> [2] https://github.com/r-devel/r-svn/blob/2b29e52e1c4e3d26b649cb7ac320b8a3dd13de30/src/main/memory.c#L4146
>>>>>> [3] https://github.com/r-lib/vctrs/pull/1968
>>>>>> [4] https://github.com/r-lib/rlang/pull/1797
>>>>> ______________________________________________
>>>>> R-devel using r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list