[Rd] significant digits (PR#9682)
Duncan Murdoch
murdoch at stats.uwo.ca
Tue Jun 3 23:12:52 CEST 2008
On 6/3/2008 4:36 PM, Simon Urbanek wrote:
> On Jun 3, 2008, at 2:48 PM, Duncan Murdoch wrote:
>
>> On 6/3/2008 11:43 AM, Patrick Carr wrote:
>>> On 6/3/08, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
>>>>
>>>> because signif(0.90, digits=2) == 0.9. Those two objects are
>>>> identical.
>>> My text above that is poorly worded. They're identical internally,
>>> yes. But in terms of the number of significant digits, 0.9 and 0.90
>>> are different. And that matters when the number is printed, say as an
>>> annotation on a graph. Passing it through sprintf() or format() later
>>> requires you to specify the number of digits after the decimal, which
>>> is different than the number of significant digits, and requires case
>>> testing for numbers of different orders of magnitude.
>>> The original complainant (and I) expected this behavior from
>>> signif(),
>>> not merely rounding. As I said before, I wrote my own workaround so
>>> this is somewhat academic, but I don't think we're alone.
>>>> As far as I know, rounding is fine in Windows:
>>>>
>>>> > round(1:10 + 0.5)
>>>> [1] 2 2 4 4 6 6 8 8 10 10
>>>>
>>> It might not be the rounding, then. (windows xp sp3)
>>> > signif(12345,digits=4)
>>> [1] 12340
>>> > signif(0.12345,digits=4)
>>> [1] 0.1235
>>
>> It's easy to make mistakes in this, but a little outside-of-R
>> experimentation suggests those are the right answers. The number
>> 12345 is exactly representable, so it is exactly half-way between
>> 12340 and 12350, so 12340 is the right answer by the unbiased round-
>> to-even rule. The number 0.12345 is not exactly representable, but
>> (I think) it is represented by something slightly closer to 0.1235
>> than to 0.1234. So it looks as though Windows gets it right.
>>
>>
>>> OS X (10.5.2/intel) does not have that problem.
>>
>> Which would seem to imply OS X gets it wrong.
>
> This has nothing to do with OS X, you get that same answer on pretty
> much all other platforms (Intel/Linux, MIPS/IRIX, Sparc/Sun, ...).
> Windows is the only one delivering the incorrect result here.
>
>
>> Both are supposed to be using the 64 bit floating point standard,
>> so they should both give the same answer:
>
> Should, yes, but Windows doesn't. In fact 10000.0 is exactly
> representable and so is 1234.5 which is the correct result that all
> except Windows get.
I think you skipped a step. The correct answer is either 0.1234 or
0.1235, not something 10000 times bigger. The first important question
is whether 0.12345 is exactly representable, and the answer is no. The
second question is whether it is represented by a number bigger or
smaller than the real number 0.12345. If it is bigger, the answer
should be 0.1235, and if it is smaller, the answer is 0.1234. My
experiments suggest it is bigger. Yours don't look relevant. It
certainly isn't exactly equal to 1234.5/10000, because that number is
not representable. It's equal to x/2^y, for some x and y, and it's a
pain to figure out exactly what they are.
However, I am pretty sure R is representing it (at least on Windows) as
the binary expansion
0.000111111001101001101011010100001011000011110010011111
while the true binary expansion (using exact rational arithmetic) starts
out
0.00011111100110100110101101010000101100001111001001111011101...
If you line those up, you'll see that the first number is bigger than
the second. (Ugly code to derive these is down below.)
Clearly the top representation is the correct one to that number of
binary digits, so I think Windows got it right, and all those other
systems didn't. This is probably because R on Windows is using extended
precision (64 bit mantissas) for intermediate results, and those other
systems stick with 53 bit mantissas.
Duncan Murdoch
# Convert number to binary expansion; add the decimal point manually
x <- 0.12345
while (x != 0) {
cat(trunc(x))
x <- x - trunc(x)
x <- x * 2
}
# Do the same thing in exact rational arithmetic
num <- 12345
denom <- 100000
for (i in 1:60) {
cat(ifelse(num > 100000, "1", "0"))
num <- num %% 100000
num <- 2*num
}
# Manually cut and paste the results to get these:
"0.000111111001101001101011010100001011000011110010011111"
"0.00011111100110100110101101010000101100001111001001111011101"
More information about the R-devel
mailing list