[R] Precision in R

Mon Feb 26 09:21:12 CET 2018

On Sun, 25 Feb 2018, Iuri Gavronski wrote:

> Hi,
>
> Why sum() on a 10-item vector produces a different value than its
> counterpart on a 2-item vector? I understand the problems related to
> the arithmetic precision in storing decimal numbers in binary format,
> but shouldn't the errors be equal regardless of the method used?

Since you understand the problems, why are you asking? More to the point, 
why are you asking as if this was a property of R rather than a property 
of all typical IEEE754 floating point implementations?

For others Googling this: short answer is don't make your program behavior 
depend on the least significant digits... they are not reliable. Read 
about the difference between double precision (53 bits or about 15 digits 
[1]) and extended precision (63 bits or about 18 digits for Intel math 
coprocessors [2]) and ask questions about this topic in an appropriate 
forum (e.g. [3]) since this issue can crop up in practically any 
programming language.

> See my example:

Examples are good. Diving into numerical analysis theory not so much.

>> options(digits=22)

Don't be misleading... you are not going to get 22digits with standard 
numeric types. Note that you get 17 digits below, and two of those are 
artifacts pulled from the artificially-expanded extended-precision values 
in the coprocessor... the numbers in memory are not that precise. The best 
that can be said for using 17digit printed values is that you have the 
best shot at being able to reproduce the actual 15 digits of double 
precision in a different program or instance of R if you keep them.

>> x=rep(.1,10)
>> x
> [1] 0.10000000000000001 0.10000000000000001 0.10000000000000001
> [4] 0.10000000000000001 0.10000000000000001 0.10000000000000001
> [7] 0.10000000000000001 0.10000000000000001 0.10000000000000001
> [10] 0.10000000000000001

Sitting in memory (not extended precision!)

>> sum(x)
> [1] 1

sum() is vectorized (compact compiled C code)... the intermediate values 
can stay in the coprocessor at extended precision. Only the result is 
trimmed back to double precision.

>> y=0
>> for (i in 1:10) {y=sum(y+x[i])}

Why are you calling the sum function on a scalar value? The addition has 
already taken place...

The actual intermediate values in this for loop get stored in RAM at 
double precision and get re-expanded to extended precision each time 
an addition occurs and then rounded again when the addition is done.
That is a lot of rounding compared to the sum function.

>> y
> [1] 0.99999999999999989
>> y*10^6
> [1] 999999.99999999988

Note that 10 = 2*5 = b'0010' * b'0101' is not just a power of 2 so both 
the binary mantissa and binary exponent are getting modified when you 
multiply by powers of 10, so the rounding may be different for various 
powers of 10.

>> sum(x)*10^6
> [1] 1e+06
>> z=.1+.1+.1+.1+.1+.1+.1+.1+.1+.1

More double precision.

>> z
> [1] 0.99999999999999989

Similar to for loop.

Don't rely on the last few digits to be stable. Take a course in numerical 
analysis if you just cannot take my word for it that those last few digits 
are going wander off a bit no matter what you do.

[1] https://en.wikipedia.org/wiki/IEEE_754
[2] https://en.wikipedia.org/wiki/Extended_precision
[3] https://cs.stackexchange.com/search?q=ieee+754

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k