[Rd] Should last default to .Machine$integer.max-1 for substring()

brodie gaslam brod|e@g@@|@m @end|ng |rom y@hoo@com
Mon Jun 21 03:40:11 CEST 2021


> On Sunday, June 20, 2021, 9:29:28 PM EDT, brodie gaslam via R-devel <r-devel using r-project.org> wrote:
>
>> On Sunday, June 20, 2021, 6:21:22 PM EDT, Michael Chirico <michaelchirico4 using gmail.com> wrote:
>>
>> The max width of a string is .Machine$integer.max-1:
>
> I think the max width is .Machine$integer.max.  What happened below is a
> bug due to buffer overflow in `strrep`:

Sorry, integer overflow.

>> # works
>> x = strrep(" ", .Machine$integer.max-1L)
>> # fails
>> x = strrep(" ", .Machine$integer.max)
>> Error in strrep(" ", .Machine$integer.max) :
>>   'Calloc' could not allocate memory (18446744071562067968 of 1 bytes)
>> (see also the comment in src/main/character.c: "Character strings in R
>> are less than 2^31-1 bytes, so we use int not size_t.")
>
> FWIW WRE states:
>
>> Note that R character strings are restricted to 2^31 - 1 bytes
>
> This is INT_MAX or .Machine$integer.max, at least on machines for which
> `int` is 32 bits, which I think typical for machines R builds on.   From
> having looked at the code a while ago I think WRE is right (so maybe the
> comment in the code is wrong), but it was a while ago and I haven't tried
> to allocate an INT_MAX long string.

So I tried it on a machine with more memory, and it works:

    > x <- strrep(" ", .Machine$integer.max-1L)
    > x <- paste0(x, " ")
    > nchar(x)
    [1] 2147483647
    > nchar(x) == .Machine$integer.max
    [1] TRUE

B.



More information about the R-devel mailing list