[Rd] [External] Vector underflow [-1] in sort(method="radix", na.last=NA)
iuke-tier@ey m@iii@g oii uiow@@edu
iuke-tier@ey m@iii@g oii uiow@@edu
Mon Dec 22 22:08:51 CET 2025
On Sun, 21 Dec 2025, Ivan Krylov via R-devel wrote:
> Hello R-devel,
>
> Some inputs cause sort(method="radix") to try to read vectors at index
> -1, which is caught for character vectors on some builds that use clang
> -fsanitize=address since r89198:
>
> podman run --rm -it \
> registry.gitlab.com/rdatatable/dockerfiles/r-devel-clang-san \
> R -q -s -e "order(NA_character_, 'c', method = 'radix', na.last = NA)"
> # Error in order(NA_character_, "c", method = "radix", na.last = NA) :
> # attempt access index -1/1 in STRING_ELT
With a build configured with --enable-strict-barrier most base calls
will use the non-inlined version, so for my setup
luke using MacBook-Air-102 build% ../barrier/bin/R -q -s -e "order(NA_character_, 'c', method = 'radix', na.last = NA)"
Error in order(NA_character_, "c", method = "radix", na.last = NA) :
attempt access index -1/1 in STRING_ELT
Execution halted
>
> Since savetl_end() did not run, some CHARSXPs retain their altered
> TRUELENGTHs. The R session is then likely to crash when it tries to
> read a negative-numbered hash bucket (usually during install() while
> lazy-loading bytecode for another function call, e.g., when wrapping
> the order() call in try()).
>
> This seems to be a matter of catching elements already sorted as NA on
> a previous pass:
>
> Index: src/main/radixsort.c
> ===================================================================
> --- src/main/radixsort.c (revision 89211)
> +++ src/main/radixsort.c (working copy)
> @@ -1766,7 +1766,9 @@
> // this edge case had to be taken care of
> // here.. (see the bottom of this file for
> // more explanation)
> - switch (TYPEOF(x)) {
> + if (o[i] == 0) { // already sorted as NA
> + isSorted = false;
> + } else switch (TYPEOF(x)) {
> case INTSXP:
> if (INTEGER(x)[o[i] - 1] == NA_INTEGER) {
> isSorted = false;
>
> I don't entirely understand what causes src/main/radixsort.c to call
> the non-inlined version of STRING_ELT in some cases.
`inline` is only a hint to the compiler; some compilers ignore the
hint more often than others.
This code was originally contributed by data.table. I believe Michael
Lawrence handled the integration at the time. There were a number of
issues like this early on that were resolved on the R side and I
believe contributed back to data.table. If you have the energy it
might be good to compare the two now and see if there are things that
should be ported from one to the other.
Best,
luke
--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tierney using uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
More information about the R-devel
mailing list