[Rd] Should last default to .Machine$integer.max-1 for substring()
Michael Chirico
m|ch@e|ch|r|co4 @end|ng |rom gm@||@com
Mon Jun 21 19:21:12 CEST 2021
Thanks all, great points well taken. Indeed it seems the default of
1000000 predates SVN tracking in 1997.
I think a NULL default behaving as "end of string" regardless of
encoding makes sense and avoids the overheads of a $ call and a much
heavier nchar() calculation.
Mike C
On Mon, Jun 21, 2021 at 1:32 AM Martin Maechler
<maechler using stat.math.ethz.ch> wrote:
>
> >>>>> Tomas Kalibera
> >>>>> on Mon, 21 Jun 2021 10:08:37 +0200 writes:
>
> > On 6/21/21 9:35 AM, Martin Maechler wrote:
> >>>>>>> Michael Chirico
> >>>>>>> on Sun, 20 Jun 2021 15:20:26 -0700 writes:
> >> > Currently, substring defaults to last=1000000L, which
> >> > strongly suggests the intent is to default to "nchar(x)"
> >> > without having to compute/allocate that up front.
> >>
> >> > Unfortunately, this default makes no sense for "very
> >> > large" strings which may exceed 1000000L in "width".
> >>
> >> Yes; and I tend to agree with you that this default is outdated
> >> (Remember : R was written to work and run on 2 (or 4?) MB of RAM on the
> >> student lab Macs in Auckland in ca 1994).
> >>
> >> > The max width of a string is .Machine$integer.max-1:
> >>
> >> (which Brodie showed was only almost true)
> >>
> >> > So it seems to me either .Machine$integer.max or
> >> > .Machine$integer.max-1L would be a more sensible default. Am I missing
> >> > something?
> >>
> >> The "drawback" is of course that .Machine$integer.max is still
> >> a function call (as R beginners may forget) contrary to <nnnnn>L,
> >> but that may even be inlined by the byte compiler (? how would we check ?)
> >> and even if it's not, it does more clearly convey the concept
> >> and idea *and* would probably even port automatically if ever
> >> integer would be increased in R.
>
> > We still have the problem that we need to count characters, not bytes,
> > if we want the default semantics of "until the end of the string".
>
> > I think we would have to fix this either by really using
> > "nchar(type="c"))" or by using e.g. NULL and then treating this as a
> > special case, that would be probably faster.
>
> > Tomas
>
> You are right, as always, Tomas.
> I agree that would be better and we should do it if/when we change
> the default there.
>
> Martin
More information about the R-devel
mailing list