[Rd] print(...,digits=2) behavior
Petr Savicky
savicky at cs.cas.cz
Mon Feb 7 23:14:39 CET 2011
On Mon, Feb 07, 2011 at 09:56:18AM +0100, Martin Maechler wrote:
> >>>>> Ben Bolker <bbolker at gmail.com>
> >>>>> on Sat, 5 Feb 2011 15:58:09 -0500 writes:
>
> > A bug was recently posted to the R bug database (which
> > probably would better have been posted as a query here) as
> > to why this happens:
>
> >> print(7.921,digits=2)
> > [1] 8
> >> print(7.92,digits=2)
> > [1] 7.9
>
> > Two things I *haven't* done to help make sense of this
> > for myself are (1) writing out the binary representations
> > to see if something obvious pops out about why this would
> > be a breakpoint and (2) poking in the source code (I did a
> > little bit of this but gave up).
>
> > I know that confusion over rounding etc. is very common,
> > and my first reaction to this sort of question is always
> > that there must be some sort of confusion (either (1) in a
> > FAQ 7.31-ish sort of way that floating point values have
> > finite precision in the first place, or (2) a confusion
> > over the difference between the value and the
> > representation of the number, or (3) more subtly, about
> > the differences among various choices of rounding
> > conventions).
>
> > However, in this case I am a bit stumped: I don't see
> > that any of the standard confusions apply. I grepped the
> > R manuals for "rounding" and didn't find anything useful
> > ... I have a very strong prior that such a core part of R
> > must be correct, and that therefore I (and the original
> > bug reporter) must be misunderstanding something.
>
> > Thoughts/references?
>
> I had started to delve into this after you've posted the bug
> report. It is clearly a bug(let),
> caused by code that has been in R from its very
> beginning, at least in the first source code I've seen in 1994.
>
> The problem is not related to digits=2,
> but using such a low number of digits shows it more
> dramatically, e.g., also
>
> > print(5.9994, digits=4)
> [1] 5.999
> > print(5.9994001, digits=4)
> [1] 6
I think that this may be a consequence of the following
analysis, which i did some time ago using the source code
of scientific() function in src/main/format.c. The
conclusion was the following.
Printing to k digits looks for the smallest number of
digits l <= k so that the relative error of the printed
mantissa (significand) is at most 10^-k. If the mantissa
begins with a digit less than 5, then this condition is
stronger than having absolute error at most 5*10^-k. So,
in this case, we cannot get lower accuracy than rounding
to k digits.
If the mantissa begins with 5 or more, then having relative
error at most 10^-k is a weaker condition than having absolute
error at most 5*10^-k. In this case, the chosen l may be
smaller than k even if the printed number has larger error
than the number rounded to k digits.
This may be seen in the following example
xx <- 8 + (6:10)*10^-7
for (x in xx) print(x)
[1] 8
[1] 8
[1] 8
[1] 8.000001
[1] 8.000001
where all the printed numbers are 8.000001 if rounded
to 7 digits.
The cases
print(7.921,digits=2)
[1] 8
print(7.92,digits=2)
[1] 7.9
seem to have the same explanation. The relative errors of the
numbers rounded to a single digit are
(8 - 7.921)/7.921
[1] 0.009973488
(8 - 7.92)/7.92
[1] 0.01010101
In the first case, this is less than 10^-2 and so, l = 1
is used. In the second case, the relative error for l = 1
is larger than 10^-2 an so, l = 2 is chosen.
In the cases
print(5.9994, digits=4)
[1] 5.999
print(5.9994001, digits=4)
[1] 6
the relative errors of one digit output are
(6 - 5.9994)/5.9994
[1] 0.00010001
(6 - 5.9994001)/5.9994001
[1] 9.999333e-05
Here, one digit output is chosen if the relative error is
less than 10^-4 and not otherwise.
Petr Savicky.
More information about the R-devel
mailing list