[Rd] Floating point issue

Simon Urbanek @|mon@urb@nek @end|ng |rom R-project@org
Tue Jul 12 03:02:01 CEST 2022


I don’t think there is any guarantee that unrepresentable numbers are parsed into defined patterns, because printing is done by the OS while parsing is done by R. The way R parses decimal numbers[1] is simply by using the obvious res = res * 10 + digit and it can be easily checked that for doubles the representation such obtained from 10000000000000000905969664 is 0x1.08b2a2c280292p+83 (see below if you want to see it yourself) which is not the same as 10^25 which is 0x1.08b2a2c280291p+83. This is true on all platforms, it is not specific to M1. The only difference is if your were to use a different type you can obtain a different result - and that is not well-defined (e.g. long doubles have no guarantees at all as of the precision).  Note that the decimal string above would require 83-bits of precision which is not representable.

(BTW: to make it even more fun, if you were to use double res = 1; repeat(25) res = res * 10; in C, so the naive computation of the original 10^25 you’d get 9999999999999998758486016 and 0x1.08b2a2c28029p+83)

Given that printing is done by the OS and parsing by R, I don’t think R guarantees anything. If you want representable number you’d use the binary representation (sprintf(“%a”) or hex-mode deparse as noted). One could argue that it could make sense to change it one way or another - either having R do it all or having the OS do it all. In the latter case one may obtain more consistent results (e.g. system stdtod() yields the original value even on M1), but it would be OS-specific. In the former R could impose its own guarantees - but currently it does not.

Cheers,
Simon

[1] - https://github.com/r-devel/r-svn/blob/97c0a73f1758d09088c200f924d27b362d55ccdc/src/main/util.c#L2094


#include <stdio.h>
#include <math.h>
#include <stdlib.h>

int main() {
  const char *str = "10000000000000000905969664", *c = str;
  double ans = 0;
  while (*c) {
    ans = 10 * ans + (*(c++) - '0');
    printf("%a\n", ans);
  }
  printf("atof: %a\n", atof(str));
  double pow1025 = pow(10.0, 25);
  printf("--\n10^25:\n%25.f\n%a\n", pow1025, pow1025);
  return 0;
}

0x1p+0
0x1.4p+3
0x1.9p+6
0x1.f4p+9
0x1.388p+13
0x1.86ap+16
0x1.e848p+19
0x1.312dp+23
0x1.7d784p+26
0x1.dcd65p+29
0x1.2a05f2p+33
0x1.74876e8p+36
0x1.d1a94a2p+39
0x1.2309ce54p+43
0x1.6bcc41e9p+46
0x1.c6bf52634p+49
0x1.1c37937e08p+53
0x1.6345785d8a001p+56
0x1.bc16d674ec801p+59
0x1.158e460913d01p+63
0x1.5af1d78b58c41p+66
0x1.b1ae4d6e2ef51p+69
0x1.0f0cf064dd593p+73
0x1.52d02c7e14af8p+76
0x1.a784379d99db6p+79
0x1.08b2a2c280292p+83
atof: 0x1.08b2a2c280291p+83
--
10^25:
10000000000000000905969664
0x1.08b2a2c280291p+83


> On 11/07/2022, at 02:00, Antoine Fabri <antoine.fabri using gmail.com> wrote:
> 
> Dear r-devel,
> 
> For some numbers, the printed value is not equivalent to the input :
> 
> options(scipen = 999)
> ## GOOD
> 1e24
> #> [1]  999999999999999983222784
> 1e24 == 999999999999999983222784
> #> [1] TRUE
> 
> ## BAD
> 1e25
> #> [1] 10000000000000000905969664
> 1e25 == 10000000000000000905969664
> #> [1] FALSE
> 
> ## STILL BAD
> 10000000000000000905969664
> #> [1] 10000000000000003053453312
> 
> ## GOOD AGAIN
> 10000000000000003053453312
> #> [1] 10000000000000003053453312
> 
> # Additionally
> 10000000000000000000000000 == 1e25
> #> [1] FALSE
> 
> Are these bugs ?
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 



More information about the R-devel mailing list