[Rd] Feature Request: Allow Underscore Separated Numbers

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Fri Jul 15 20:26:49 CEST 2022


Thanks for posting that list.  The Python document is the only one I've 
read so far; it has a really nice summary 
(https://peps.python.org/pep-0515/#prior-art) of the differences in 
implementations among 10 languages.  Which choice would you recommend, 
and why?

  - I think Ivan's quick solution doesn't quite match any of them.
  - C, Fortran and C++ have special support in R, but none of them use 
underscore separators.
  - C++ does support separators, but uses "'", not "_", and some ancient 
forms of Fortran ignore embedded spaces.

Duncan Murdoch

On 15/07/2022 1:58 p.m., Jim Hester wrote:
> Allowing underscores in numeric literals is becoming a very common
> feature in computing languages. All of these languages (and more) now
> support it
> 
> python: https://peps.python.org/pep-0515/
> javascript: https://v8.dev/features/numeric-separators
> julia: https://docs.julialang.org/en/v1/manual/integers-and-floating-point-numbers/#Floating-Point-Numbers
> java: https://docs.oracle.com/javase/7/docs/technotes/guides/language/underscores-literals.html#:~:text=In%20Java%20SE%207%20and,the%20readability%20of%20your%20code.
> ruby: https://docs.ruby-lang.org/en/2.0.0/syntax/literals_rdoc.html#label-Numbers
> perl: https://perldoc.perl.org/perldata#Scalar-value-constructors
> rust: https://doc.rust-lang.org/rust-by-example/primitives/literals.html
> C#: https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/builtin-types/floating-point-numeric-types#real-literals
> go: https://go.dev/ref/spec#Integer_literals
> 
> Its use in this context also dates back to at least Ada 83
> (http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html#:~:text=A%20decimal%20literal%20is%20a,the%20base%20is%20implicitly%20ten).&text=An%20underline%20character%20inserted%20between,value%20of%20this%20numeric%20literal.)
> 
> Many other communities see the benefit of this feature, I think R's
> community would benefit from it as well.
> 
> On Fri, Jul 15, 2022 at 1:22 PM Ivan Krylov <krylov.r00t using gmail.com> wrote:
>>
>> On Fri, 15 Jul 2022 11:25:32 -0400
>> <avi.e.gross using gmail.com> wrote:
>>
>>> R normally delays evaluation so chunks of code are handed over
>>> untouched to functions that often play with the text directly without
>>> evaluating it until, perhaps, much later.
>>
>> Do they play with the text, or with the syntax tree after it went
>> through the parser? While it's true that R saves the source text of the
>> functions for ease of debugging, it's not guaranteed that a given
>> object will have source references, and typical NSE functions operate
>> on language objects which are tree-like structures containing R values,
>> not source text.
>>
>> You are, of course, right that any changes to the syntax of the
>> language must be carefully considered, but if anyone wants to play with
>> this idea, it can be implemented in a very simple manner:
>>
>> --- src/main/gram.y     (revision 82598)
>> +++ src/main/gram.y     (working copy)
>> @@ -2526,7 +2526,7 @@
>>       YYTEXT_PUSH(c, yyp);
>>       /* We don't care about other than ASCII digits */
>>       while (isdigit(c = xxgetc()) || c == '.' || c == 'e' || c == 'E'
>> -          || c == 'x' || c == 'X' || c == 'L')
>> +          || c == 'x' || c == 'X' || c == 'L' || c == '_')
>>       {
>>          count++;
>>          if (c == 'L') /* must be at the end.  Won't allow 1Le3 (at present). */
>> @@ -2533,6 +2533,9 @@
>>          {   YYTEXT_PUSH(c, yyp);
>>              break;
>>          }
>> +       if (c == '_') { /* allow an underscore anywhere inside the literal */
>> +           continue;
>> +       }
>>
>>          if (c == 'x' || c == 'X') {
>>              if (count > 2 || last != '0') break;  /* 0x must be first */
>>
>> To an NSE function, the underscored literals are indistinguishable from
>> normal ones, because they don't see the literals:
>>
>> stopifnot(all.equal(\() 1000000, \() 1_000_000))
>> f <- function(x, y) stopifnot(all.equal(substitute(x), substitute(y)))
>> f(1e6, 1_000_000)
>>
>> Although it's true that the source references change as a result:
>>
>> lapply(
>>   list(\() 1000000, \() 1_000_000),
>>   \(.) as.character(getSrcref(.))
>> )
>> # [[1]]
>> # [1] "\\() 1000000"
>> #
>> # [[2]]
>> # [1] "\\() 1_000_000"
>>
>> This patch is somewhat simplistic: it allows both multiple underscores
>> in succession and underscores at the end of the number literal. Perl
>> does so too, but with a warning:
>>
>> perl -wE'say "true" if 1__000_ == 1000'
>> # Misplaced _ in number at -e line 1.
>> # Misplaced _ in number at -e line 1.
>> # true
>>
>> --
>> Best regards,
>> Ivan
>>
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list