[Rd] strtoi output of empty string inconsistent across platforms

Martin Maechler m@echler @ending from @t@t@m@th@ethz@ch
Sat Jan 12 15:49:02 CET 2019


>>>>> Michael Chirico 
>>>>>     on Sat, 12 Jan 2019 17:34:03 +0800 writes:

    > Thanks Martin.  For what it's worth, this extremely
    > representative, highly scientific Twitter poll suggests
    > the Mac/Linux split is pretty stark (NA on Mac, 0 on
    > Linux)

    > https://twitter.com/michael_chirico/status/1083649190117306369?s=17

    > On Sat, Jan 12, 2019, 2:00 AM Martin Maechler
    > <maechler using stat.math.ethz.ch wrote:

    >> >>>>> Martin Maechler >>>>> on Fri, 11 Jan 2019 09:44:14
    >> +0100 writes:
    >> 
    >> >>>>> Michael Chirico >>>>> on Fri, 11 Jan 2019 14:36:17
    >> +0800 writes:
    >> 
    >> >> Identified as root cause of a bug in data.table: >>
    >> https://github.com/Rdatatable/data.table/issues/3267
    >> 
    >> >> On my machine, strtoi("", base = 2L) produces
    >> NA_integer_ >> (which seems consistent with ?strtoi:
    >> "Values which >> cannot be interpreted as integers or
    >> would overflow are >> returned as NA_integer_").
    >> 
    >> > indeed consistent with R's documentation on strtoi().
    >> > What machine would that be?
    >> 
    >> >> But on all the other machines I've seen, 0L is >>
    >> returned. This seems to be consistent with the output of
    >> >> a simple C program using the underlying strtol
    >> function >> (see data.table link for this program, and
    >> for full >> sessionInfo() of some environments with
    >> differing >> output).
    >> 
    >> >> So, what is the correct output of strtoi("", base =
    >> 2L)?
    >> 
    >> >> Is the cross-platform inconsistency to be >>
    >> expected/documentable?
    >> 
    >> > The inconsistency is certainly undesirable.  The
    >> relevant > utility function in R's source
    >> (<R>/src/main/character.c) > is
    >> 
    >> > static int strtoi(SEXP s, int base) { long int res;
    >> char > *endp;
    >> 
    >> > /* strtol might return extreme values on error */ >
    >> errno = 0;
    >> 
    >> > if(s == NA_STRING) return(NA_INTEGER); res = >
    >> strtol(CHAR(s), &endp, base); /* ASCII */ if(errno || >
    >> *endp != '\0') res = NA_INTEGER; if(res > INT_MAX || res
    >> < > INT_MIN) res = NA_INTEGER; return (int) res; }
    >> 
    >> > and so it clearly is a platform-inconsistency in the >
    >> underlying C library's strtol().
    >> 
    >> (corrected typos here: )
    >> 
    >> > I think we should make this cross-platform consistent
    >> ...  > and indeed it makes much sense to ensure the
    >> result of
    >> 
    >> > strtoi("", base=2L) to become NA_integer_
    >> 
    >> > but chances are that would break code that has relied
    >> on > the current behavior {on "all but your computer"
    >> ;-)} ?
    >> 
    >> I still think that such a change should be done.
    >> 
    >> 'make check all' on the R source (+ Recommended packages)
    >> seems not to signal any error or warning with such a
    >> change, so I plan to commit that change to "the trunk" /
    >> "R-devel" soon, unless concerns are raised highly (and
    >> quickly enough).

I've now committed  svn rev 75977  to R-devel  which does change
strtoi("", <base>)  to return NA_integer_   (for all <base>)
platform independently, not even calling lib C's strtol() in
that case.

Martin



More information about the R-devel mailing list