[Rd] Feature Request: Allow Underscore Separated Numbers
Bill Dunlap
w||||@mwdun|@p @end|ng |rom gm@||@com
Fri Jul 15 21:34:24 CEST 2022
The token '._1' (period underscore digit) is currently parsed as a symbol
(name). It would become a number if underscore were ignored as in the
first proposal. The just-between-digits alternative would avoid this
change.
-Bill
On Fri, Jul 15, 2022 at 12:26 PM Jim Hester <james.f.hester using gmail.com>
wrote:
> I think keeping it simple and less restrictive is the best approach,
> for ease of implementation, limiting future maintenance, and so users
> have the flexibility to format these however they wish. So I would
> probably lean towards allowing multiple delimiters anywhere (including
> trailing) or possibly just between digits.
>
> On Fri, Jul 15, 2022 at 2:26 PM Duncan Murdoch <murdoch.duncan using gmail.com>
> wrote:
> >
> > Thanks for posting that list. The Python document is the only one I've
> > read so far; it has a really nice summary
> > (https://peps.python.org/pep-0515/#prior-art) of the differences in
> > implementations among 10 languages. Which choice would you recommend,
> > and why?
> >
> > - I think Ivan's quick solution doesn't quite match any of them.
> > - C, Fortran and C++ have special support in R, but none of them use
> > underscore separators.
> > - C++ does support separators, but uses "'", not "_", and some ancient
> > forms of Fortran ignore embedded spaces.
> >
> > Duncan Murdoch
> >
> > On 15/07/2022 1:58 p.m., Jim Hester wrote:
> > > Allowing underscores in numeric literals is becoming a very common
> > > feature in computing languages. All of these languages (and more) now
> > > support it
> > >
> > > python: https://peps.python.org/pep-0515/
> > > javascript: https://v8.dev/features/numeric-separators
> > > julia:
> https://docs.julialang.org/en/v1/manual/integers-and-floating-point-numbers/#Floating-Point-Numbers
> > > java:
> https://docs.oracle.com/javase/7/docs/technotes/guides/language/underscores-literals.html#:~:text=In%20Java%20SE%207%20and,the%20readability%20of%20your%20code
> .
> > > ruby:
> https://docs.ruby-lang.org/en/2.0.0/syntax/literals_rdoc.html#label-Numbers
> > > perl: https://perldoc.perl.org/perldata#Scalar-value-constructors
> > > rust:
> https://doc.rust-lang.org/rust-by-example/primitives/literals.html
> > > C#:
> https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/builtin-types/floating-point-numeric-types#real-literals
> > > go: https://go.dev/ref/spec#Integer_literals
> > >
> > > Its use in this context also dates back to at least Ada 83
> > > (
> http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html#:~:text=A%20decimal%20literal%20is%20a,the%20base%20is%20implicitly%20ten).&text=An%20underline%20character%20inserted%20between,value%20of%20this%20numeric%20literal
> .)
> > >
> > > Many other communities see the benefit of this feature, I think R's
> > > community would benefit from it as well.
> > >
> > > On Fri, Jul 15, 2022 at 1:22 PM Ivan Krylov <krylov.r00t using gmail.com>
> wrote:
> > >>
> > >> On Fri, 15 Jul 2022 11:25:32 -0400
> > >> <avi.e.gross using gmail.com> wrote:
> > >>
> > >>> R normally delays evaluation so chunks of code are handed over
> > >>> untouched to functions that often play with the text directly without
> > >>> evaluating it until, perhaps, much later.
> > >>
> > >> Do they play with the text, or with the syntax tree after it went
> > >> through the parser? While it's true that R saves the source text of
> the
> > >> functions for ease of debugging, it's not guaranteed that a given
> > >> object will have source references, and typical NSE functions operate
> > >> on language objects which are tree-like structures containing R
> values,
> > >> not source text.
> > >>
> > >> You are, of course, right that any changes to the syntax of the
> > >> language must be carefully considered, but if anyone wants to play
> with
> > >> this idea, it can be implemented in a very simple manner:
> > >>
> > >> --- src/main/gram.y (revision 82598)
> > >> +++ src/main/gram.y (working copy)
> > >> @@ -2526,7 +2526,7 @@
> > >> YYTEXT_PUSH(c, yyp);
> > >> /* We don't care about other than ASCII digits */
> > >> while (isdigit(c = xxgetc()) || c == '.' || c == 'e' || c == 'E'
> > >> - || c == 'x' || c == 'X' || c == 'L')
> > >> + || c == 'x' || c == 'X' || c == 'L' || c == '_')
> > >> {
> > >> count++;
> > >> if (c == 'L') /* must be at the end. Won't allow 1Le3 (at
> present). */
> > >> @@ -2533,6 +2533,9 @@
> > >> { YYTEXT_PUSH(c, yyp);
> > >> break;
> > >> }
> > >> + if (c == '_') { /* allow an underscore anywhere inside the
> literal */
> > >> + continue;
> > >> + }
> > >>
> > >> if (c == 'x' || c == 'X') {
> > >> if (count > 2 || last != '0') break; /* 0x must be
> first */
> > >>
> > >> To an NSE function, the underscored literals are indistinguishable
> from
> > >> normal ones, because they don't see the literals:
> > >>
> > >> stopifnot(all.equal(\() 1000000, \() 1_000_000))
> > >> f <- function(x, y) stopifnot(all.equal(substitute(x), substitute(y)))
> > >> f(1e6, 1_000_000)
> > >>
> > >> Although it's true that the source references change as a result:
> > >>
> > >> lapply(
> > >> list(\() 1000000, \() 1_000_000),
> > >> \(.) as.character(getSrcref(.))
> > >> )
> > >> # [[1]]
> > >> # [1] "\\() 1000000"
> > >> #
> > >> # [[2]]
> > >> # [1] "\\() 1_000_000"
> > >>
> > >> This patch is somewhat simplistic: it allows both multiple underscores
> > >> in succession and underscores at the end of the number literal. Perl
> > >> does so too, but with a warning:
> > >>
> > >> perl -wE'say "true" if 1__000_ == 1000'
> > >> # Misplaced _ in number at -e line 1.
> > >> # Misplaced _ in number at -e line 1.
> > >> # true
> > >>
> > >> --
> > >> Best regards,
> > >> Ivan
> > >>
> > >> ______________________________________________
> > >> R-devel using r-project.org mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/r-devel
> > >
> > > ______________________________________________
> > > R-devel using r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
[[alternative HTML version deleted]]
More information about the R-devel
mailing list