[Rd] Floating point issue

Tue Jul 12 09:17:43 CEST 2022

Simon, 

I think the issue is rather that we have a representable number that is not correctly parsed by R. Step by step: 

The mentioned number 1e25 is, as you say, not representable in IEEE 754 double precision, with two closest representable numbers being

 10000000000000000905969664 (bit pattern 0 10001010010 0000100010110010101000101100001010000000001010010001)

and

 9999999999999998758486016 (bit patterns 0 10001010010 0000100010110010101000101100001010000000001010010000)

So the fact that 1e25 gets printed out as 10000000000000000905969664 makes perfect sense — it’s the closest number that can be represented. Which is exactly what one expects and also the usual guarantee — unrepresentable number literals are supposed to be parsed to a closest representable number. This is a problem that can be (and has been) solved correctly and reliably, so there is no reason why R wouldn’t offer the same guarantee here. 

But let’s get to the actual problem. If you type in 10000000000000000905969664 into R on an Apple Silicon machine you will get 

  10000000000000003053453312 (bit pattern 0 10001010010 0000100010110010101000101100001010000000001010010010)

Which is the next double in the sequence and not a correct way to parse the number.  

So unless I made a mistake somewhere it indeed looks like a bug in R’s number parsing code. It produces the expected, correctly rounded result for a non-representable 1e25, but is one bit off for the precisely representable 10000000000000000905969664

And BTW, I have tried the same with C and Swift, and it works as expected, i.e. in C 

  assert(10000000000000000000000000.0 == 10000000000000000905969664.0 );

Best, 

  Taras 

> On 12 Jul 2022, at 03:02, Simon Urbanek <simon.urbanek using R-project.org> wrote:
> 
> I don’t think there is any guarantee that unrepresentable numbers are parsed into defined patterns, because printing is done by the OS while parsing is done by R. The way R parses decimal numbers[1] is simply by using the obvious res = res * 10 + digit and it can be easily checked that for doubles the representation such obtained from 10000000000000000905969664 is 0x1.08b2a2c280292p+83 (see below if you want to see it yourself) which is not the same as 10^25 which is 0x1.08b2a2c280291p+83. This is true on all platforms, it is not specific to M1. The only difference is if your were to use a different type you can obtain a different result - and that is not well-defined (e.g. long doubles have no guarantees at all as of the precision).  Note that the decimal string above would require 83-bits of precision which is not representable.
> 
> (BTW: to make it even more fun, if you were to use double res = 1; repeat(25) res = res * 10; in C, so the naive computation of the original 10^25 you’d get 9999999999999998758486016 and 0x1.08b2a2c28029p+83)
> 
> Given that printing is done by the OS and parsing by R, I don’t think R guarantees anything. If you want representable number you’d use the binary representation (sprintf(“%a”) or hex-mode deparse as noted). One could argue that it could make sense to change it one way or another - either having R do it all or having the OS do it all. In the latter case one may obtain more consistent results (e.g. system stdtod() yields the original value even on M1), but it would be OS-specific. In the former R could impose its own guarantees - but currently it does not.
> 
> Cheers,
> Simon
> 
> [1] - https://github.com/r-devel/r-svn/blob/97c0a73f1758d09088c200f924d27b362d55ccdc/src/main/util.c#L2094
> 
> 
> #include <stdio.h>
> #include <math.h>
> #include <stdlib.h>
> 
> int main() {
>  const char *str = "10000000000000000905969664", *c = str;
>  double ans = 0;
>  while (*c) {
>    ans = 10 * ans + (*(c++) - '0');
>    printf("%a\n", ans);
>  }
>  printf("atof: %a\n", atof(str));
>  double pow1025 = pow(10.0, 25);
>  printf("--\n10^25:\n%25.f\n%a\n", pow1025, pow1025);
>  return 0;
> }
> 
> 0x1p+0
> 0x1.4p+3
> 0x1.9p+6
> 0x1.f4p+9
> 0x1.388p+13
> 0x1.86ap+16
> 0x1.e848p+19
> 0x1.312dp+23
> 0x1.7d784p+26
> 0x1.dcd65p+29
> 0x1.2a05f2p+33
> 0x1.74876e8p+36
> 0x1.d1a94a2p+39
> 0x1.2309ce54p+43
> 0x1.6bcc41e9p+46
> 0x1.c6bf52634p+49
> 0x1.1c37937e08p+53
> 0x1.6345785d8a001p+56
> 0x1.bc16d674ec801p+59
> 0x1.158e460913d01p+63
> 0x1.5af1d78b58c41p+66
> 0x1.b1ae4d6e2ef51p+69
> 0x1.0f0cf064dd593p+73
> 0x1.52d02c7e14af8p+76
> 0x1.a784379d99db6p+79
> 0x1.08b2a2c280292p+83
> atof: 0x1.08b2a2c280291p+83
> --
> 10^25:
> 10000000000000000905969664
> 0x1.08b2a2c280291p+83
> 
> 
>> On 11/07/2022, at 02:00, Antoine Fabri <antoine.fabri using gmail.com> wrote:
>> 
>> Dear r-devel,
>> 
>> For some numbers, the printed value is not equivalent to the input :
>> 
>> options(scipen = 999)
>> ## GOOD
>> 1e24
>> #> [1]  999999999999999983222784
>> 1e24 == 999999999999999983222784
>> #> [1] TRUE
>> 
>> ## BAD
>> 1e25
>> #> [1] 10000000000000000905969664
>> 1e25 == 10000000000000000905969664
>> #> [1] FALSE
>> 
>> ## STILL BAD
>> 10000000000000000905969664
>> #> [1] 10000000000000003053453312
>> 
>> ## GOOD AGAIN
>> 10000000000000003053453312
>> #> [1] 10000000000000003053453312
>> 
>> # Additionally
>> 10000000000000000000000000 == 1e25
>> #> [1] FALSE
>> 
>> Are these bugs ?
>> 
>> 	[[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
> 
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

	[[alternative HTML version deleted]]