[Rd] Feature Request: Allow Underscore Separated Numbers
Jim Hester
j@me@@|@he@ter @end|ng |rom gm@||@com
Fri Jul 15 19:58:35 CEST 2022
Allowing underscores in numeric literals is becoming a very common
feature in computing languages. All of these languages (and more) now
support it
python: https://peps.python.org/pep-0515/
javascript: https://v8.dev/features/numeric-separators
julia: https://docs.julialang.org/en/v1/manual/integers-and-floating-point-numbers/#Floating-Point-Numbers
java: https://docs.oracle.com/javase/7/docs/technotes/guides/language/underscores-literals.html#:~:text=In%20Java%20SE%207%20and,the%20readability%20of%20your%20code.
ruby: https://docs.ruby-lang.org/en/2.0.0/syntax/literals_rdoc.html#label-Numbers
perl: https://perldoc.perl.org/perldata#Scalar-value-constructors
rust: https://doc.rust-lang.org/rust-by-example/primitives/literals.html
C#: https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/builtin-types/floating-point-numeric-types#real-literals
go: https://go.dev/ref/spec#Integer_literals
Its use in this context also dates back to at least Ada 83
(http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html#:~:text=A%20decimal%20literal%20is%20a,the%20base%20is%20implicitly%20ten).&text=An%20underline%20character%20inserted%20between,value%20of%20this%20numeric%20literal.)
Many other communities see the benefit of this feature, I think R's
community would benefit from it as well.
On Fri, Jul 15, 2022 at 1:22 PM Ivan Krylov <krylov.r00t using gmail.com> wrote:
>
> On Fri, 15 Jul 2022 11:25:32 -0400
> <avi.e.gross using gmail.com> wrote:
>
> > R normally delays evaluation so chunks of code are handed over
> > untouched to functions that often play with the text directly without
> > evaluating it until, perhaps, much later.
>
> Do they play with the text, or with the syntax tree after it went
> through the parser? While it's true that R saves the source text of the
> functions for ease of debugging, it's not guaranteed that a given
> object will have source references, and typical NSE functions operate
> on language objects which are tree-like structures containing R values,
> not source text.
>
> You are, of course, right that any changes to the syntax of the
> language must be carefully considered, but if anyone wants to play with
> this idea, it can be implemented in a very simple manner:
>
> --- src/main/gram.y (revision 82598)
> +++ src/main/gram.y (working copy)
> @@ -2526,7 +2526,7 @@
> YYTEXT_PUSH(c, yyp);
> /* We don't care about other than ASCII digits */
> while (isdigit(c = xxgetc()) || c == '.' || c == 'e' || c == 'E'
> - || c == 'x' || c == 'X' || c == 'L')
> + || c == 'x' || c == 'X' || c == 'L' || c == '_')
> {
> count++;
> if (c == 'L') /* must be at the end. Won't allow 1Le3 (at present). */
> @@ -2533,6 +2533,9 @@
> { YYTEXT_PUSH(c, yyp);
> break;
> }
> + if (c == '_') { /* allow an underscore anywhere inside the literal */
> + continue;
> + }
>
> if (c == 'x' || c == 'X') {
> if (count > 2 || last != '0') break; /* 0x must be first */
>
> To an NSE function, the underscored literals are indistinguishable from
> normal ones, because they don't see the literals:
>
> stopifnot(all.equal(\() 1000000, \() 1_000_000))
> f <- function(x, y) stopifnot(all.equal(substitute(x), substitute(y)))
> f(1e6, 1_000_000)
>
> Although it's true that the source references change as a result:
>
> lapply(
> list(\() 1000000, \() 1_000_000),
> \(.) as.character(getSrcref(.))
> )
> # [[1]]
> # [1] "\\() 1000000"
> #
> # [[2]]
> # [1] "\\() 1_000_000"
>
> This patch is somewhat simplistic: it allows both multiple underscores
> in succession and underscores at the end of the number literal. Perl
> does so too, but with a warning:
>
> perl -wE'say "true" if 1__000_ == 1000'
> # Misplaced _ in number at -e line 1.
> # Misplaced _ in number at -e line 1.
> # true
>
> --
> Best regards,
> Ivan
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list