[Rd] Feature Request: Allow Underscore Separated Numbers

Fri Jul 15 17:25:32 CEST 2022

Andr�,

I am not saying a change cannot be done and am not familiar enough with the
internals of R. If you just want the interpreter to evaluate CONSTANTS in
the code as what you consider syntactic sugar and replace 1_000 with 1000,
that sounds superficially possible. But is it?

R normally delays evaluation so chunks of code are handed over untouched to
functions that often play with the text directly without evaluating it
until, perhaps, much later. And I have pointed out how much work is done
with things like regular expressions or reading things in from a file that
is not done in the REPL but in functions behind the scene. So if there is
any way for a number to slide in without being modified, or places where you
want the darn underscores preserved, you may well cause a glitch.

Languages that design in the ability have obviously dealt with issues and
presumably anyone writing code anew can use a new definition in their work
so they handle such numbers. I am not saying such a change cannot be done,
simply that existing languages are careful about making changes as they
strive to retain compatibility.

So even assuming your statement about not needing to change as.numeric or
read.csv functions is true, aren�t you introducing a change in which the
users will inadvertently use the feature in strings or files and assume it
is a globally recognized feature? I use CSV files and other such formats
quite a bit as a way to exchange data between R and other environments and
unless they all change and allow underscores in numbers, there can be
issues. So, yes, you are suggesting nothing in R will write out numbers with
underscores. But if others do and you import the data into R with a reader
that does not understand, we have anomalies.

I am not arguing with anyone about this. Like many proposed features, it
sounds reasonable just by itself. But for a language that was crafted and
then modified many times, the burden is often on those wanting a change to
convince us that it can be done benignly, effectively and cheaply AND that
it is more worthwhile than a thousand other pending ideas already submitted.

I have never used str2lang() in my life directly so would changing that
really help if as.numeric() and other such functions were left alone and did
not call it? What if I read in a .CSV a line at a time and use various
methods including regular expressions to split the line into parts and then
make the parts into numbers based on some primitive algorithm that maps
digits 0-9 into small integers 0-9 and then positionally multiplies digits
to the left by 10 for each level and adds them up. Will that algorithm know
about underscores and not only ignore them but keep track of how many times
it multiplies the other parts by 10? Sure, we can write a new algorithm with
added complexity but in my view, we can solve the problem in the few cases
it matters without such a change.

Had this been built in originally, maybe not a problem. But consider the
enormous expense of UNICODE and the truly major upheaval needed to get it
working  at a time when lots of code using pointers had a reasonable
expectation that all characters took up the same number of bytes, and
calculating the length of a string could be done by simply subtracting one
pointer from another. Now, you actually have to read the entire string and
count code points, or keep the length as a part of the structure that is
changed any time it changes and so on.

But arguably UNICODE support is now required in many cases. So, yes,
underscores in numbers may become commonplace and cause headaches for a
while. But mathematically, I don�t see them as needed and see many ways to
allow a programmer to see what a number is without any problems in the few
times they want it. Cut and paste in code can easily take out any snippet
accurately and pluck it into a function that displays it with commas or
whatever. But definitely, lazy humans constantly make mistakes and even with
this would still make some.

But if R developers seem confident this change can be done, go for it!
Numeric literals, like other constants, have often been something compiled
languages have optimized out of the way, such as combining multiple
instances of the same one into one memory location.

Avi

From: GILLIBERT, Andre <Andre.Gillibert using chu-rouen.fr> 
Sent: Friday, July 15, 2022 2:31 AM
To: avi.e.gross using gmail.com; r-devel using r-project.org
Subject: RE: [Rd] Feature Request: Allow Underscore Separated Numbers

On 2022-07-14 8:21 p.m., avi.e.gross using gmail.com
<mailto:avi.e.gross using gmail.com>  wrote:
> Devin,
>
> I cannot say anyone wants to tweak R after the fact to accept numeric
> items with underscores as that might impact all kinds of places.
>

I am not sure that the feature request of Devin Marlin was correctly
understood.

I guess that he thought about adding syntactic sugar to numeric literals in
the language.

Functions such as as.numeric(), or read.csv() would not be changed.

The main difference would be to make valid code that currently is a "syntax
error", such as:

> 3*100_000

Error: unexpected input in "3*100_"

Breaking code with that feature is possible but improbable.

Indeed, code expecting that str2lang("3*100_000") make a syntax error
(catching the error with try) would break.

Most code generating other code then parsing it with str2lang() should be
fine, because it would generate old-style code with normal numeric
constants.

-- 

Sincerely

Andr� GILLIBERT

	[[alternative HTML version deleted]]