[Rd] Converting non-32-bit integers from python to R to use bit64: reticulate

Kevin Ushey kev|nu@hey @end|ng |rom gm@||@com
Tue Jun 4 19:14:22 CEST 2019


I think a more productive conversation could be: what additions to R
would allow for user-defined types / classes that behave just like the
built-in vector types? As a motivating example, one cannot currently
use the 64bit integer objects from bit64 to subset data frames:

   > library(bit64); mtcars[as.integer64(1:3), ]
    [1] mpg  cyl  disp hp   drat wt   qsec vs   am   gear carb
   <0 rows> (or 0-length row.names)

I think ALTREP presents a possibility here, in that we could have a
64bit integer ALTREP object that behaves either like an INTSXP or
REALSXP as necessary. But I'm not sure how we would handle large 64bit
integer values which won't fit in either an INTSXP or REALSXP (in the
REALSXP case, precision could be lost for values > 2^53).

One possibility would be to allow ALTREP objects to have a chance at
managing dispatch in some methods, so that (for example) in e.g.
data[<ALTREP>], the ALTREP object has the opportunity to choose how
the data object should be subsetted. Of course, this implies wiring
through yet another dispatch mechanism through a category of primitive
/ internal functions, which could be expensive in terms of
implementation / maintenance... and I'm not sure if this could play
well with the existing S3 / S4 dispatch mechanisms.

FWIW, I think most commonly 64bit integers arise as e.g. database keys
/ IDs, and are typically just used for subsetting / reordering of data
as opposed to math. In these cases, converting the 64bit integers to a
character vector is typically a viable workaround, although it's much
slower.

Still, at least to me, it seems like there is likely a path forward
with ALTREP for 64bit integer vectors that can behave (more or less)
just like builtin R vectors.

Best,
Kevin

On Tue, Jun 4, 2019 at 9:34 AM Martin Maechler
<maechler using stat.math.ethz.ch> wrote:
>
> >>>>> Juan Telleria Ruiz de Aguirre
> >>>>>     on Mon, 3 Jun 2019 06:50:17 +0200 writes:
>
>     > Thank you Martin for giving to know and developing 'Rmpfr' library for
>     > unlimited size integers (GNU C GMP) and arbitrary precision floats (GNU C
>     > MPFR):
>
>     > https://cran.r-project.org/package=Rmpfr
>
>     > My question is: In the long term (For R3.7.0 or R3.8.0):
>
>     > Does it have sense that CMP substitutes INTSXP, and MPFR substitutes
>     > REALSXP code? With this we would achieve that an integer is always an
>     > integer, and a numeric double precision float always a numeric double
>     > precision float, without sometimes casting underneath.
>
>     > And would the R Community / R Ordinary Members would be willing to help R
>     > Core on such implementation (If has sense, and wants to be adopted)?
>
> No, such a change has "no sense" and hence won't be adopted (in
> this form):
>
> - INTSXP and REALSXP are part of the C API of R, and are well defined.
>   Changing them will almost surely break 100s and by
>   dependencies, probably 1000s of existing R packages.
>
> - I'm sure Python and other system do have fixed size "double
>   precision" vectors, because that's how you interface with all
>   pre-existing computational libraries,
>   and I am almost sure that support of arbitrary long integer
>   (or double) is via another class/type.
>
> - I know that Julia has adopted these (GMP and MPFR I think)
>   types and nicely interfaces them on a relatively "base" level.
>   With their nice class hierarchy (and very nice "S4 like" multi-argument
>   method dispatch for *all* functions) it can look quite
>   seemless for the user to work with these extended classes, but
>   they are not all identical to the basic "real"/"double" or "integer" classes.
>
> - I'm not the expert here (but there are not so many experts
>   ..), but I'm pretty sure that adding new "basic types" in the
>   underlying C level seems not at all easy for R.  It would mean a big
>   break in all back compatibility -- which is conceivable --
>   and *may* also need a big rewrite of much of the R code base
>   which seems less conceivable in the mid term (2-3 years; long
>   term: > 5 years).
>
>
>     > Thank you all! :)
>
> You are welcome.
>
> I think we should close this thread here,  unless some real
> experts join.
>
> Martin
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list