[Rd] Native 64 Integers

Juan Telleria Ruiz de Aguirre jtelleri@@rproject @ending from gm@il@com
Mon Sep 24 22:50:32 CEST 2018


Dear R Developers,

I would like to pick up back again the issue of 64 bits integers with R:

http://r.789695.n4.nabble.com/Re-R-support-for-64-bit-integers-td2320024.html

*** CURRENT SITUATION ***

At the moment, as regards integers, all the following are the same type:

* length of an R vector
* R integer type
* C int type (Fixed at 32 bits: In practice)
 * Fortran INTEGER type (Fixed at 32 bits: By Standard)

*** OBJECTIVE ***

Introducing 64-bit integers natively into "base R", notably if it was
also allowed using them for indices.

And, ideally, we would like:
* length of an R vector.
* R integer type.
To become 64bit.

This would allow to free ourselves from the increasingly relevant
maximum-atomic-object-length = 2^31 problem.

*** DIFFICULTIES ***
a) If both the R length type and the R integer type become the same
64bit type and replace the current integer type -> Then every compiled
package would have to change to declare the arguments as int64 (or
long, on most 64bit systems) and INTEGER*8.

b) If the R length type changes to something /different/ from the
integer type then any compiled code has to be checked to see if C int
arguments are lengths or integers, which is more work and more
error-prone.

c) On the other hand, changing the integer type to 64bit -> Will
presumably make integer code run noticeably more slowly on 32bit
systems.

In any case, the changes could be postponed by having an option to
.C/.Call forcing lengths and integers to be passed as 32-bit -> This
would mean that: The code couldn't use large integers or large
vectors, but it would keep working indefinitely.

*** 2010 SOLUTION***

There were 2 possibilities at the time:
a) Using 64-bit integers.
b) Using "double precision integers": Solution Finally Chosen at 2010.
Reason: In order that not all R packages using compiled code had to be
patched extensively.

*** BIT64 PACKAGE***
Nowdays, we have 'bit64' Package,  which provides serializable S3
atomic 64bit (signed) integers (+-2^63).

But this are not a replacement for 32bit integers, as integer64 are:
* Not supported for subscripting.
* Have different semantics when combined with double, e.g. integer64 +
double => integer64.

https://cran.r-project.org/web/packages/bit64/index.html

*** PROPOSAL ***

Instead of seeing 64 integers as a substitution to 32 bit integers,
these could be included into base R as a new / additional data type,
which co-exists with:
a) Using 64-bit integers.
b) Using "double precision integers".

This new data type could:
* Be based (ported) from "bit64" package: https://github.com/cran/bit64
 * Allow to use int64 Data Type for Subscripting.
* Have Coercion Rules such as:
as.integer64()
is.integer64()
integer + integer64 => integer
double + integer64 => double
* Be included with a double "L". e.g.: 34783274893274892334279LL (This
would be integer64, not double).

By doing so, existing packages would not need to be recompiled, and
could keep on working as already do. So we would not introduce
backward incompatible change.

*** FINAL KEY IDEA ***

Take already developed "bit64" Package (https://github.com/cran/bit64)
as base for building a new Integer64 Type System which co-exists
natively in R with Integer32 (Just as "parallel" package was included
in the past into base R for example), and build on top of it
improvements.



More information about the R-devel mailing list