[Rd] Feature Request: Allow Underscore Separated Numbers

Ivan Krylov kry|ov@r00t @end|ng |rom gm@||@com
Fri Jul 15 19:21:31 CEST 2022


On Fri, 15 Jul 2022 11:25:32 -0400
<avi.e.gross using gmail.com> wrote:

> R normally delays evaluation so chunks of code are handed over
> untouched to functions that often play with the text directly without
> evaluating it until, perhaps, much later.

Do they play with the text, or with the syntax tree after it went
through the parser? While it's true that R saves the source text of the
functions for ease of debugging, it's not guaranteed that a given
object will have source references, and typical NSE functions operate
on language objects which are tree-like structures containing R values,
not source text.

You are, of course, right that any changes to the syntax of the
language must be carefully considered, but if anyone wants to play with
this idea, it can be implemented in a very simple manner:

--- src/main/gram.y	(revision 82598)
+++ src/main/gram.y	(working copy)
@@ -2526,7 +2526,7 @@
     YYTEXT_PUSH(c, yyp);
     /* We don't care about other than ASCII digits */
     while (isdigit(c = xxgetc()) || c == '.' || c == 'e' || c == 'E'
-	   || c == 'x' || c == 'X' || c == 'L')
+	   || c == 'x' || c == 'X' || c == 'L' || c == '_')
     {
 	count++;
 	if (c == 'L') /* must be at the end.  Won't allow 1Le3 (at present). */
@@ -2533,6 +2533,9 @@
 	{   YYTEXT_PUSH(c, yyp);
 	    break;
 	}
+	if (c == '_') { /* allow an underscore anywhere inside the literal */
+	    continue;
+	}
 	
 	if (c == 'x' || c == 'X') {
 	    if (count > 2 || last != '0') break;  /* 0x must be first */

To an NSE function, the underscored literals are indistinguishable from
normal ones, because they don't see the literals:

stopifnot(all.equal(\() 1000000, \() 1_000_000))
f <- function(x, y) stopifnot(all.equal(substitute(x), substitute(y)))
f(1e6, 1_000_000)

Although it's true that the source references change as a result:

lapply(
 list(\() 1000000, \() 1_000_000),
 \(.) as.character(getSrcref(.))
)
# [[1]]
# [1] "\\() 1000000"
# 
# [[2]]
# [1] "\\() 1_000_000"

This patch is somewhat simplistic: it allows both multiple underscores
in succession and underscores at the end of the number literal. Perl
does so too, but with a warning:

perl -wE'say "true" if 1__000_ == 1000'
# Misplaced _ in number at -e line 1.
# Misplaced _ in number at -e line 1.
# true

-- 
Best regards,
Ivan



More information about the R-devel mailing list