[Rd] print(...,digits=2) behavior

Martin Maechler maechler at stat.math.ethz.ch
Mon Feb 7 09:56:18 CET 2011


>>>>> Ben Bolker <bbolker at gmail.com>
>>>>>     on Sat, 5 Feb 2011 15:58:09 -0500 writes:

    >   A bug was recently posted to the R bug database (which
    > probably would better have been posted as a query here) as
    > to why this happens:

    >> print(7.921,digits=2)
    > [1] 8
    >> print(7.92,digits=2)
    > [1] 7.9

    >   Two things I *haven't* done to help make sense of this
    > for myself are (1) writing out the binary representations
    > to see if something obvious pops out about why this would
    > be a breakpoint and (2) poking in the source code (I did a
    > little bit of this but gave up).

    >   I know that confusion over rounding etc. is very common,
    > and my first reaction to this sort of question is always
    > that there must be some sort of confusion (either (1) in a
    > FAQ 7.31-ish sort of way that floating point values have
    > finite precision in the first place, or (2) a confusion
    > over the difference between the value and the
    > representation of the number, or (3) more subtly, about
    > the differences among various choices of rounding
    > conventions).

    >   However, in this case I am a bit stumped: I don't see
    > that any of the standard confusions apply.  I grepped the
    > R manuals for "rounding" and didn't find anything useful
    > ...  I have a very strong prior that such a core part of R
    > must be correct, and that therefore I (and the original
    > bug reporter) must be misunderstanding something.

    >   Thoughts/references?

I had started to delve into this after you've posted the bug
report. It is clearly a bug(let),
caused by code that has been in  R  from its very
beginning, at least in the first source code I've seen in 1994.

The problem is not related to digits=2,
but using such a low number of digits shows it more
dramatically, e.g., also

 > print(5.9994, digits=4)
 [1] 5.999
 > print(5.9994001, digits=4)
 [1] 6

Interestingly, the problem seems *not* to be present for
digits = 1 (only).

I haven't found time to mathematically "analyze" it for
determining a correct solution though.
Note that fixing this bug(let) will probably (very slightly)
change a huge number of R outputs .. so there is a caveat,
but nonetheless, we must approach it.

The responsible code is the C function  scientific()
in src/main/format.c 

a somewhat direct R interface to its functionality is given by
R's format.info() :

  > format.info(7.92, digits=2)
  [1] 3 1 0
  > format.info(7.921, digits=2)
  [1] 1 0 0

which means that 7.92 will use 3 characters, 1 digit after the decimal
7.921 will use 1 character, 0 digits after decimal.

The crucial "buggy" part of the code is  {line 180 ff} :

	/* compute number of digits */

	*nsig = R_print.digits;
	for (j = 1; j <= *nsig; j++) {
	    if (fabs(alpha - floor(alpha+0.5)) < eps * alpha) {
		*nsig = j;
		break;
	    }
	    alpha *= 10.0;
	}

where notably
	    if (fabs(alpha - floor(alpha+0.5)) < eps * alpha) {
is not quite the correct check.
Note that  eps = 10^-2  in our case and  alpha = 7.92 or 7.921
and that  8 - 7.921 = 0.079 <  7.921/10^2  is just at the border.

Looking for "all the border cases" is solving the above
analytically with '=' and setting  k = ceiling(alpha) :

> p0 <- function(...) paste(..., sep="")
> k <- 6:9; names(k) <- p0("k=",k)
> d <- 1:5; names(d) <- p0("d=",d)
> xc <- outer(k, 1+10^-d, "/")
> print(xc, digits= 11)
             d=1          d=2         d=3        d=4          d=5
k=6 5.4545454545 5.9405940594 5.994005994 5.99940006 5.9999400006
k=7 6.3636363636 6.9306930693 6.993006993 6.99930007 6.9999300007
k=8 7.2727272727 7.9207920792 7.992007992 7.99920008 7.9999200008
k=9 8.1818181818 8.9108910891 8.991008991 8.99910009 8.9999100009

So we see that for our case, 7.920792... is more exactly the
border case.

One change that is *not* correct is making it an absolute
instead of relative comparison, i.e.  replacing the RHS
eps * alpha  by  eps

Namely, one important purpose of the test is to ensure that e.g.

  print(3.597, digits = 3)

is printed as  3.6   and not 3.60

Now I have had -- since 1997 at least -- an R version of
scientific() for more easy experimentation.
Here's an updated version of that:

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: scientific.R
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20110207/d05c4b37/attachment.pl>
-------------- next part --------------

and here some of the experimentation code:

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: scientific-ex.R
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20110207/d05c4b37/attachment-0001.pl>
-------------- next part --------------

-------

Now, I'm waiting for comments/suggestions ..
Martin


More information about the R-devel mailing list