[Rd] print(...,digits=2) behavior

Mon Feb 7 23:14:39 CET 2011

On Mon, Feb 07, 2011 at 09:56:18AM +0100, Martin Maechler wrote:
> >>>>> Ben Bolker <bbolker at gmail.com>
> >>>>>     on Sat, 5 Feb 2011 15:58:09 -0500 writes:
> 
>     >   A bug was recently posted to the R bug database (which
>     > probably would better have been posted as a query here) as
>     > to why this happens:
> 
>     >> print(7.921,digits=2)
>     > [1] 8
>     >> print(7.92,digits=2)
>     > [1] 7.9
> 
>     >   Two things I *haven't* done to help make sense of this
>     > for myself are (1) writing out the binary representations
>     > to see if something obvious pops out about why this would
>     > be a breakpoint and (2) poking in the source code (I did a
>     > little bit of this but gave up).
> 
>     >   I know that confusion over rounding etc. is very common,
>     > and my first reaction to this sort of question is always
>     > that there must be some sort of confusion (either (1) in a
>     > FAQ 7.31-ish sort of way that floating point values have
>     > finite precision in the first place, or (2) a confusion
>     > over the difference between the value and the
>     > representation of the number, or (3) more subtly, about
>     > the differences among various choices of rounding
>     > conventions).
> 
>     >   However, in this case I am a bit stumped: I don't see
>     > that any of the standard confusions apply.  I grepped the
>     > R manuals for "rounding" and didn't find anything useful
>     > ...  I have a very strong prior that such a core part of R
>     > must be correct, and that therefore I (and the original
>     > bug reporter) must be misunderstanding something.
> 
>     >   Thoughts/references?
> 
> I had started to delve into this after you've posted the bug
> report. It is clearly a bug(let),
> caused by code that has been in  R  from its very
> beginning, at least in the first source code I've seen in 1994.
> 
> The problem is not related to digits=2,
> but using such a low number of digits shows it more
> dramatically, e.g., also
> 
>  > print(5.9994, digits=4)
>  [1] 5.999
>  > print(5.9994001, digits=4)
>  [1] 6

I think that this may be a consequence of the following
analysis, which i did some time ago using the source code
of scientific() function in src/main/format.c. The
conclusion was the following.

Printing to k digits looks for the smallest number of
digits l <= k so that the relative error of the printed
mantissa (significand) is at most 10^-k. If the mantissa
begins with a digit less than 5, then this condition is
stronger than having absolute error at most 5*10^-k. So,
in this case, we cannot get lower accuracy than rounding
to k digits.

If the mantissa begins with 5 or more, then having relative
error at most 10^-k is a weaker condition than having absolute
error at most 5*10^-k. In this case, the chosen l may be
smaller than k even if the printed number has larger error
than the number rounded to k digits.

This may be seen in the following example

  xx <- 8 + (6:10)*10^-7
  for (x in xx) print(x)

  [1] 8
  [1] 8
  [1] 8
  [1] 8.000001
  [1] 8.000001

where all the printed numbers are 8.000001 if rounded
to 7 digits.

The cases 

  print(7.921,digits=2)
  [1] 8

  print(7.92,digits=2)
  [1] 7.9

seem to have the same explanation. The relative errors of the
numbers rounded to a single digit are

  (8 - 7.921)/7.921
  [1] 0.009973488

  (8 - 7.92)/7.92
  [1] 0.01010101

In the first case, this is less than 10^-2 and so, l = 1
is used. In the second case, the relative error for l = 1
is larger than 10^-2 an so, l = 2 is chosen.

In the cases

  print(5.9994, digits=4)
  [1] 5.999
  print(5.9994001, digits=4)
  [1] 6

the relative errors of one digit output are

  (6 - 5.9994)/5.9994
  [1] 0.00010001

  (6 - 5.9994001)/5.9994001
  [1] 9.999333e-05

Here, one digit output is chosen if the relative error is
less than 10^-4 and not otherwise.

Petr Savicky.