[Rd] Should last default to .Machine$integer.max-1 for substring()
Martin Maechler
m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Mon Jun 21 10:32:30 CEST 2021
>>>>> Tomas Kalibera
>>>>> on Mon, 21 Jun 2021 10:08:37 +0200 writes:
> On 6/21/21 9:35 AM, Martin Maechler wrote:
>>>>>>> Michael Chirico
>>>>>>> on Sun, 20 Jun 2021 15:20:26 -0700 writes:
>> > Currently, substring defaults to last=1000000L, which
>> > strongly suggests the intent is to default to "nchar(x)"
>> > without having to compute/allocate that up front.
>>
>> > Unfortunately, this default makes no sense for "very
>> > large" strings which may exceed 1000000L in "width".
>>
>> Yes; and I tend to agree with you that this default is outdated
>> (Remember : R was written to work and run on 2 (or 4?) MB of RAM on the
>> student lab Macs in Auckland in ca 1994).
>>
>> > The max width of a string is .Machine$integer.max-1:
>>
>> (which Brodie showed was only almost true)
>>
>> > So it seems to me either .Machine$integer.max or
>> > .Machine$integer.max-1L would be a more sensible default. Am I missing
>> > something?
>>
>> The "drawback" is of course that .Machine$integer.max is still
>> a function call (as R beginners may forget) contrary to <nnnnn>L,
>> but that may even be inlined by the byte compiler (? how would we check ?)
>> and even if it's not, it does more clearly convey the concept
>> and idea *and* would probably even port automatically if ever
>> integer would be increased in R.
> We still have the problem that we need to count characters, not bytes,
> if we want the default semantics of "until the end of the string".
> I think we would have to fix this either by really using
> "nchar(type="c"))" or by using e.g. NULL and then treating this as a
> special case, that would be probably faster.
> Tomas
You are right, as always, Tomas.
I agree that would be better and we should do it if/when we change
the default there.
Martin
More information about the R-devel
mailing list