[Rd] [R] bug when subtracting decimals?
Marc Schwartz
marc_schwartz at me.com
Wed Apr 22 15:56:07 CEST 2009
On Apr 22, 2009, at 4:49 AM, Martin Maechler wrote:
>>>>>> "MS" == Marc Schwartz <marc_schwartz at me.com>
>>>>>> on Tue, 21 Apr 2009 08:06:46 -0500 writes:
>
>
> MS> It does look like R's behavior has changed since then. Using:
>
> MS> R version 2.9.0 Patched (2009-04-18 r48348)
>
> MS> on OSX:
>
> MS> # This first example has changed.
> MS> # Prior result was 414.99999999999994
>>> print(4.145 * 100 + 0.5, digits = 20)
> MS> [1] 415
>
>>> formatC(4.145 * 100 + 0.5, format = "E", digits = 20)
> MS> [1] "4.14999999999999943157E+02"
>
>>> print(0.5 - 0.4 - 0.1, digits = 20)
> MS> [1] -2.77555756156289e-17
>
>>> formatC(0.5 - 0.4 - 0.1, format = "E", digits = 20)
> MS> [1] "-2.77555756156289135106E-17"
>
>
> MS> What is interesting is that:
>
>>> 4.145 * 100 + 0.5 == 415
> MS> [1] FALSE
>
>>> (4.145 * 100 + 0.5) - 415
> MS> [1] -5.684342e-14
>
>>> all.equal(4.145 * 100 + 0.5, 415, 0)
> MS> [1] "Mean relative difference: 1.369721e-16"
>
>
> MS> So it would appear that in the first R example above, the
> print()
> MS> function has changed in a material fashion.
>
> Yes ((though not with *my* vote...)).
> However, be aware that such calculations *are* platform
> dependent, and IIUC, you are now using OS X wheras you've used
> another platform previously, so some of the differences you see
> may not be from changes in R, but from changes in the platform
> you use.
> Back to the topic of print():
> Actually, also format(<numeric>) has changed similarly to my
> chagrin.
> In older versions of R, you could ask it to give "too many" digits,
> but now it gives "too few" even for maximal 'digits'.
> {There is a good reason - which I don't recall - for the new behavior}
>
> With as.character() it was worse (in older R versions): it gave
> sometimes too little digits, sometimes too many, whereas now it
> is at least consistently giving "too little".
> But the effect is that in ch <- as.character(x) ,
> ch may contain duplicated entries even for unique x,
> e.g., for x <- c(1, 1 + 4e-16)
>
> BTW, one alternative to {"my"} formatC() is sprintf(),
> and if you are really interested: The latest changes (in 2.10.0 R-
> devel),
> ensuring unique factor levels actually now make use of
> sprintf("%.17g", .)
> instead of as.character(.) exactly in order to ensure that
> different numbers map to different strings necessarily.
>
> BTW, we are way off topic for R-help, being in R-devel realm,
> but as this thread has started here, we may keep it...
>
> Martin Maechler, ETH Zurich
>
Thanks for replying Martin.
While I appreciate your comment above, I am moving to r-devel given
the content. I agree that we are getting into low level subject matter.
FWIW, I grabbed my dusty old Dell laptop running Fedora 10 out of the
closet and booted it up.
I get the same behavior as above there with R 2.8.1 patched.
So this would suggest that it it not an OS issue, but indeed a change
in R.
I did try to build R 1.7.1 (the version used in the prior examples
almost 6 years ago) on OSX, but it would appear that things have
changed sufficiently in the intervening time frame as to preclude a
successful build. I suspect much of the issue may be that Apple moved
to Intel CPU's only about 4 years ago, so perhaps the configuration of
older versions of R on OSX for Intel would require much work which is
not worth it here. I would of course defer to others with more in-
depth knowledge on that point.
I did not see anything in any of the *NEWS files, but the help for
print() does reference:
Warning
Using too large a value of digits may lead to representation errors in
the calculation of the number of significant digits and the decimal
representation: these are likely for digits >= 16, and these possible
errors are taken into account in assessing the numher of significant
digits to be printed in that case.
Whereas earlier versions of R might have printed further digits for
digits >= 16 on some platforms, they were not necessarily reliable.
While I don't want to re-visit what from your comments appears to be a
sensitive subject, I do want to point out that this new behavior
arguably masks aspects of the original subject matter of the thread
from users. It also results in inconsistent behavior when compared to
the output of the other floating point comparisons I used, which
suggest that the result of the operation is not an integer, which will
serve to further confuse folks.
Is there some reasonable compromise to be had here such that
consistent and predictable behavior is possible in this realm,
especially given how frequently this fundamental subject comes up?
We of course don't need examples as complicated as the one above and
can use the more common:
> print(0.5 - 0.4, 20)
[1] 0.1
> 0.5 - 0.4 == 0.1
[1] FALSE
> all.equal(0.5 - 0.4, 0.1, 0)
[1] "Mean relative difference: 2.775558e-16"
So arguably, we are talking about boundary situations here.
Thanks Martin!
Marc
More information about the R-devel
mailing list