Rounding to Decimal Digits in Binary

Intro

In R (and its predecessors S and S+), we have always known and often used round(x, digits) to round a numeric (or complex) vector of numbers x to digits decimals after the (decimal) point. However, not many R users (nor scientists for that matter) have been aware of the fact that such rounding is not trivial because our computers use binary (base 2) arithmetic and we are rounding to decimal digits, aka decimals, i.e., in base 10.

On the topic of floating point computation, we have had the most frequently asked question (FAQ) about R, the infamous R FAQ 7.31, and in 2017, Romain François even created an R package seven31 (not on CRAN) to help useRs exploring what we say in the FAQ.

Recently, there has been an official R bug report (on R’s bugzilla <bugs.r-project.org>), PR#17668 with summary title Artificial numerical error in round() causing round to even to fail.

Adam Wheeler started with a shorter version (just using digits = 1,2,..,8) of the following examples and his own remarks about correctness:

op <- options(digits = 15)
##' maybe add to package -- it's really useful here :
formatF <- function(x, ...) noquote(format(x, scientific=FALSE, drop0trailing=TRUE, ...))
formatF(rbind(           # R <= 3.6.x :
round(55.5, 0),          # 56 <- correct
round(55.55, 1),         # 55.5
round(55.555, 2),        # 55.55
round(55.5555, 3),       # 55.556 <- correct
round(55.55555, 4),      # 55.5555
round(55.555555, 5),     # 55.55555
round(55.5555555, 6),    # 55.555555
round(55.55555555, 7),   # 55.5555556 <- correct
round(55.555555555, 8),  # 55.55555555
round(55.5555555555, 9), # 55.555555555
round(55.55555555555,10),# 55.5555555556 <- correct
round(55.555555555555,11)# 55.55555555556 <- correct
))

##       [,1]          
##  [1,] 56            
##  [2,] 55.5          
##  [3,] 55.56         
##  [4,] 55.556        
##  [5,] 55.5555       
##  [6,] 55.55556      
##  [7,] 55.555555     
##  [8,] 55.5555556    
##  [9,] 55.55555555   
## [10,] 55.555555556  
## [11,] 55.5555555556 
## [12,] 55.55555555556

(Note that we eventually will not agree on the above correct judgements, as the matter is more subtle than one might think at first)

Whereas the exact result of the R code above currently depends on your version of R, our round package’s roundX(x, dig, version = "r1.C") now provides these, using the same C source code as R 3.6.2 (Note we adopt the convention to use "r<n>.C", ending in .C for round()ing versions where R calls package C code):

require(round)

## Loading required package: round

formatF(rbind(
  roundX(55.5, 0, "r1.C"),
  roundX(55.55, 1, "r1.C"),
  roundX(55.555, 2, "r1.C"),
  roundX(55.5555, 3, "r1.C"),
  roundX(55.55555, 4, "r1.C"),
  roundX(55.555555, 5, "r1.C"),
  roundX(55.5555555, 6, "r1.C"),
  roundX(55.55555555, 7, "r1.C"),
  roundX(55.555555555, 8, "r1.C"),
  roundX(55.5555555555, 9, "r1.C"),
  roundX(55.55555555555,10, "r1.C"),
  roundX(55.555555555555,11, "r1.C")))

##       [,1]          
##  [1,] 56            
##  [2,] 55.5          
##  [3,] 55.55         
##  [4,] 55.556        
##  [5,] 55.5555       
##  [6,] 55.55555      
##  [7,] 55.555555     
##  [8,] 55.5555556    
##  [9,] 55.55555555   
## [10,] 55.555555555  
## [11,] 55.5555555556 
## [12,] 55.55555555556

Adam (see above) used his own C code to see what happens in R’s C code for round() and proposed to simplify the C code, not doing offset calculations (which substract the integer part intx, round and re-add intx), as now available with roundX(*, "r0.C"). This has started an “investigative” story which got quite a bit longer than anticipated.

The Easy Problem “in Theory”

Rounding to Integers

Well, rounding is supposedly quite simple and most of us learned a version of it in our first years of school when learning the (decimal) numbers, calculating and then simple decimal fractions. What most pupils learn (first) is round up, and round down (to an integer), and then end with the “best” rounding to nearest i.e., rounding to the nearest integer, notably using “round half up”, which mathematically is simply computing $y = r_0(x) := \left\lfloor x + 0.5 \right\rfloor$, see e.g., Wikipedia, Rounding, or when taking negative numbers into account and ensuring symmetry, $y = \mathrm{sgn}(x) \left\lfloor \left| x \right| + 0.5 \right\rfloor = -\mathrm{sgn}(x) \left\lceil -\left| x \right| - 0.5 \right\rceil$, where the “floor” $\lfloor x \rfloor$ and “ceiling” $\lceil x \rceil$ operators are defined (customary in mathematics) as

\[ \lfloor x \rfloor := \max_{n \in \mathbb{Z}} \{n \le x\}, \] and \[ \lceil x \rceil := \min_{n \in \mathbb{Z}} \{n \ge x\}. \]

Whereas these (rounding to nearest and “round half up”) rules, also called “commercial rounding”, are widely taught and used, they lead to a small bias e.g., when numbers have all the same sign, and for this reason, for computing, round_half_to_even has been proposed and adopted as default rounding mode in the IEEE 754 floating-point standard and it corresponds to C and C++ math library function nearbyint(), and has been called convergent rounding, statistician’s rounding, bankers’ rounding (and more), see Wikipedia, Round half to even. It is also what you get with R’s round(x), as round()’s second argument has default 0:

str(round)

## function (x, digits = 0)

So, for now, let’s set and use

\[ round(x) := r(x) := nearbyint(x) \]

Note that all this rounding to integer is defined independently of the base of your number representation system (such as “decimal” or “binary” ).

Rounding to non-zero Digits

Things change slightly, when rounding to a non-zero number of decimal places, i.e., digits. In theory, i.e., if computers would compute mathematically perfectly accurately, also this has a simple solution:

\[ round(x, d) := r(x \cdot 10^d) / 10^d \]

for all integer $d \in \mathbb{Z}$ (i.e., including negative digits $d$).

All the practical problems stem from the fact that on a (binary arithmetic) computer, the number $x/10$ cannot be represented exactly in binary unless $x$ is an integer multiple of 5. Indeed, the simple fraction 1/5 is an infinite length binary fraction: \[ \frac 1 5 = \frac{1 \cdot \frac{3}{16}}{5 \cdot \frac{3}{16}} = \frac{\frac{3}{16}}{1 - \frac{1}{16}} = \frac{3}{16} \frac{1}{1 - \frac{1}{16}} = \left(\frac{1}{8} + \frac{1}{16}\right) \cdot \left(1 + \frac{1}{16} + \frac{1}{16^2} + \ldots\right) \\ = (0.001100110011001100110011\ldots\ldots)_2 . \]

Versions of round()ing - The Story

At first, R bugzilla, comment #6, I’ve committed a version of Adam’s proposal to R-devel (R svn r77609, on 2019-12-21, ¹) but found that the simplification improved the above examples in that it always rounded to even, but it clearly broke cases that were working correctly in R 3.6.x. That version is available with our roundX(*, version = "r0.C") (Version 0 as it is even simpler than version 1).

One CRAN package had relied on round(x, digits = .Machine$integer.max) to return integers unchanged, but

roundX(c(-2,2), digits = .Machine$integer.max, version = "r0.C")

## [1] -Inf  Inf

gave non-sense, and there were less extreme cases of relatively large digits which had stopped working with “r0”. See the two roundX() versions via simple wrapper roundAll():

i <- c(-1,1)* 2^(33:16)
stopifnot(i == floor(i)) # are integer valued
roundAll(i, digits = 300, versions = c("r0.C", "r1.C"))

##             r0.C        r1.C
##  [1,]       -Inf -8589934592
##  [2,]        Inf  4294967296
##  [3,]       -Inf -2147483648
##  [4,]        Inf  1073741824
##  [5,]       -Inf  -536870912
##  [6,]        Inf   268435456
##  [7,] -134217728  -134217728
##  [8,]   67108864    67108864
##  [9,]  -33554432   -33554432
## [10,]   16777216    16777216
## [11,]   -8388608    -8388608
## [12,]    4194304     4194304
## [13,]   -2097152    -2097152
## [14,]    1048576     1048576
## [15,]    -524288     -524288
## [16,]     262144      262144
## [17,]    -131072     -131072
## [18,]      65536       65536

Looking at these, I also found that internally, R’s round() had effectively worked as if digits <- pmin(308, digits), i.e., truncated digits larger than 308. This is clearly not good enough for very small numbers (in absolute value),

e <- 5.555555555555555555555e-308
d <- 312:305 ; names(d) <- paste0("d=", d)
roundAll(e, d, versions = c("r0.C", "r1.C", "r2.C"))

##         r0.C   r1.C        r2.C
## d=312 6e-308 6e-308 5.5556e-308
## d=311 6e-308 6e-308 5.5560e-308
## d=310 6e-308 6e-308 5.5600e-308
## d=309 6e-308 6e-308 5.6000e-308
## d=308 6e-308 6e-308 6.0000e-308
## d=307 1e-307 1e-307 1.0000e-307
## d=306  0e+00  0e+00  0.0000e+00
## d=305  0e+00  0e+00  0.0000e+00

As I was embarrassed to have blundered, I’ve worked and committed what now corresponds to roundX(*, version = "r2.C") to R-devel (R svn r77618, on 2019-12-24, 16:11).

Also, Jeroen Ooms, maintainer of CRAN package jsonlite, contacted the CRAN team and me about the change in R-devel which broke one regression test of that package, and on Dec 27, he noticed that R 3.6.2’s version of round() was seemingly compatible ² with the (C library dependent) versions of sprintf() and also with R’s format() whereas the R-devel versions where not, for his example: (Note: format(x, digits = d) uses significant digits d, not digits after the decimal point as round() does!)

x <- 9.18665
format(x, digits = 5) #  "9.1867"

## [1] "9.1867"

sprintf("%.4f", x)    #  "9.1867"

## [1] "9.1867"

## but -- showing now
roundAll(x, 4)

## sprintf    r0.C    r1.C   r1a.C    r2.C   r2a.C    r3.C   r3d.C      r3 
##  9.1867  9.1866  9.1867  9.1867  9.1866  9.1866  9.1866  9.1866  9.1866

## but note that
print(x, digits=18)

## [1] 9.1866500000000002

which (typically) shows 9.1866500000000002, i.e., a number closer to 9.1867 than to 9.1866, and so really should be rounded up, not down. However, that is partly wrong: Whereas it is true that it’s closer to 9.1867 than to 9.1866, one must be aware that these two decimal numbers are neither exactly representable in binary double precision, and a further careful look shows actually the double precision version of these rounded numbers do have the exact same distance to x and that the main principle round to nearest here gives a tie:

print(rbind(9.1866, 9.18665, 9.1867), digits=18)

##                     [,1]
## [1,] 9.18660000000000032
## [2,] 9.18665000000000020
## [3,] 9.18670000000000009

(dx <- c(9.1866, 9.1867) - x) # [1] -4.99999999998835e-05  4.99999999998835e-05

## [1] -4.99999999998835e-05  4.99999999998835e-05

diff(abs(dx)) # is zero !

## [1] 0

options(digits=7) # revert to (typical) default

and because of the tie, the round to even rule must apply which means rounding down to 9.1866, and so both libc’s printf and hence R’s sprintf() are as wrong as R 3.6.x has been, and indeed all our roundX() versions apart from "sprintf" and the ’“r1*“’ (previous R) ones, do round down:

roundAll(x, 4)

## sprintf    r0.C    r1.C   r1a.C    r2.C   r2a.C    r3.C   r3d.C      r3 
##  9.1867  9.1866  9.1867  9.1867  9.1866  9.1866  9.1866  9.1866  9.1866

Finally, I think we’ve seen the light and on one hand recalled what we have known for long (but most R users will not be aware of at all)

Almost all finite decimal fractions are not (exactly) representable as binary double precision numbers,

and consequently,

round to nearest applies much more often directly rather than via the tie breaking rule round to even even for the case where the decimal fraction ends in a 5.

and hence, a “correct” implementation must really measure, not guess which of the two possible decimals is closer to x. This lead to our R level algorithm round_r3() which is the workhorse used by roundX(x,d, version = "r3") :

round_r3

## function(x, d, info=FALSE, check=TRUE) {
##     if(check)
##         stopifnot(!anyNA(d), length(d) == 1L) # length(x) is arbitrary
##     max10e <- 308L # in C, =  (int) DBL_MAX_10_EXP; // == 308 ("IEEE")
##     if(d > +max10e + 15L) # assuming DBL_DIG = 15
##         return(x)
##     else if(d < -max10e)
##         return(0. * x)
##     ## else  # "regular" digits ---------------------------
##     p10 <- 10^d
##     x10 <- as.vector(p10*x) # drop attributes for computation
##     xd <- (i10 <- floor(x10)) / p10 # = x, rounded [d]own
##     xu <-       ceiling(x10)  / p10 # = x, rounded [u]p
##     ## should have xd <= x <= xu
##     D <- (xu - x) - (x - xd)
##     ## D >  0 ==> xu is farther away from x than xd ==> round *down*
##     ## D <  0 ==> xu is closer  to        x   .. .. ==> round *up*
##     ## D == 0 ==> both in *same* distance: round "to even"
##     e <- i10 %% 2 # = 1  <==>  i10 is odd <==> "even" is *up*
##     r <- x
##     i <- (D < 0) | (e & (D == 0)) # round up
##     r[ i] <- xu[ i]
##     r[!i] <- xd[!i]
##     if(info) list(r=r, D=D, e=e) else r
## }
## <bytecode: 0xc4eac50>
## <environment: namespace:round>

and its two C level versions "r3.C" (using long double) and "r3d.C" (“d” for “double”, as it uses double precision only).

For R 4.0.0, this (equivalent of "r3d.C"), has been used for round(x, digits) now, as does not use long double and is very close to the R level implementation "r3" (i.e., round_r3()) makes it potentially less platform dependent and easier to explain and document.

Lastly, note that the original set of examples is then treated differently from all previous proposals:

op <- options(digits = 15, width = 2*3*23)
formatF(rbind(
 d0 = roundAll(55.5, 0)
,d1 = roundAll(55.55, 1)
,d2 = roundAll(55.555, 2)
,d3 = roundAll(55.5555, 3)
,d4 = roundAll(55.55555, 4)
,d5 = roundAll(55.555555, 5)
,d6 = roundAll(55.5555555, 6)
,d7 = roundAll(55.55555555, 7)
,d8 = roundAll(55.555555555, 8)
,d9 = roundAll(55.5555555555, 9)
,d10= roundAll(55.55555555555,10))); options(op)

##     sprintf       r0.C          r1.C          r1a.C         r2.C          r2a.C         r3.C          r3d.C         r3           
## d0  56            56            56            56            56            56            56            56            56           
## d1  55.5          55.6          55.5          55.5          55.6          55.6          55.5          55.5          55.5         
## d2  55.55         55.56         55.55         55.55         55.56         55.56         55.56         55.56         55.56        
## d3  55.556        55.556        55.556        55.556        55.556        55.556        55.556        55.556        55.556       
## d4  55.5555       55.5556       55.5555       55.5555       55.5556       55.5556       55.5555       55.5555       55.5555      
## d5  55.55555      55.55556      55.55555      55.55555      55.55556      55.55556      55.55556      55.55556      55.55556     
## d6  55.555555     55.555556     55.555555     55.555555     55.555556     55.555556     55.555555     55.555555     55.555555    
## d7  55.5555556    55.5555556    55.5555556    55.5555556    55.5555556    55.5555556    55.5555556    55.5555556    55.5555556   
## d8  55.55555555   55.55555556   55.55555555   55.55555555   55.55555556   55.55555556   55.55555555   55.55555555   55.55555555  
## d9  55.555555555  55.555555556  55.555555555  55.555555555  55.555555556  55.555555556  55.555555556  55.555555556  55.555555556 
## d10 55.5555555556 55.5555555556 55.5555555556 55.5555555556 55.5555555556 55.5555555556 55.5555555556 55.5555555556 55.5555555556

Alternative Approaches

As I had asked for comments about my proposal(s), on the R-devel mailing lists, Steven Dirkse (@ GAMS) mentioned he was not quite happy with the considerations above, and in the end (not on-list) summarized his own rational approach in the following way – which I like to present as it is coherent and simple, summarized as:

all double precision numbers are rationals mathematically, i.e., $\in \mathbb{Q}$.
round(x, n) as a mathematical function is well defined unambigously on the rationals. In our notation above, simply $round(x, n) := r(x \cdot 10^n) / 10^n$, where $r(x) := round(x) = nearbyint(x)$, using round to even if desired.

These two define exact rounding and he would want R to use that. Note for that, R would have to use exact arithmetic with rational numbers which has been available for years via the C library GNU MP aka GMP and in R via CRAN pkg gmp. In the vignette Q Round, we indeed use CRAN package gmp to perform such exact rounding using exact rational numbers, and compare these with the (double precision arithmetic) algorithms we have introduced here.

However, even if R would “come with GMP” in the future (which is conceivable for other reasons), we should note that this would still keep rx <- round(x, n) in R inaccurate in most cases, since rx is numeric, i.e., a double precision number, and almost all rational numbers are not exactly representable as double.

Session information

(Note half a dozen non-standard packages present only as dependences of rmarkdown we use for rendering this vignette)

## R version 4.3.2 Patched (2024-01-06 r85796)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Fedora Linux 38 (Thirty Eight)
## 
## Matrix products: default
## BLAS:   /u/maechler/R/D/r-patched/F38-64-inst/lib/libRblas.so 
## LAPACK: /usr/lib64/liblapack.so.3.11.0
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] round_0.21-0.2
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.33   R6_2.5.1        fastmap_1.1.1   xfun_0.41      
##  [5] cachem_1.0.8    knitr_1.45      htmltools_0.5.7 rmarkdown_2.25 
##  [9] lifecycle_1.0.4 cli_3.6.2       sass_0.4.8      jquerylib_0.1.4
## [13] compiler_4.3.2  tools_4.3.2     evaluate_0.23   bslib_0.6.1    
## [17] yaml_2.3.8      rlang_1.1.2     jsonlite_1.8.7

using Winston Chang’s semi-official mirror of R’s official SVN repository at https://svn.r-project.org/R/ (as https://svn.r-project.org/R/trunk@77609 has not been enabled in its server)↩︎
do note that sprintf() and R 3.6.x’s version of round() are not at all equivalent, even if they are for Jeroen’s example↩︎