[Rd] [R] bug when subtracting decimals?

Wed Apr 22 15:56:07 CEST 2009

On Apr 22, 2009, at 4:49 AM, Martin Maechler wrote:

>>>>>> "MS" == Marc Schwartz <marc_schwartz at me.com>
>>>>>>    on Tue, 21 Apr 2009 08:06:46 -0500 writes:
>
>
>    MS> It does look like R's behavior has changed since then. Using:
>
>    MS> R version 2.9.0 Patched (2009-04-18 r48348)
>
>    MS> on OSX:
>
>    MS> # This first example has changed.
>    MS> # Prior result was 414.99999999999994
>>> print(4.145 * 100 + 0.5, digits = 20)
>    MS> [1] 415
>
>>> formatC(4.145 * 100 + 0.5, format = "E", digits = 20)
>    MS> [1] "4.14999999999999943157E+02"
>
>>> print(0.5 - 0.4 - 0.1, digits = 20)
>    MS> [1] -2.77555756156289e-17
>
>>> formatC(0.5 - 0.4 - 0.1, format = "E", digits = 20)
>    MS> [1] "-2.77555756156289135106E-17"
>
>
>    MS> What is interesting is that:
>
>>> 4.145 * 100 + 0.5 == 415
>    MS> [1] FALSE
>
>>> (4.145 * 100 + 0.5) - 415
>    MS> [1] -5.684342e-14
>
>>> all.equal(4.145 * 100 + 0.5, 415, 0)
>    MS> [1] "Mean relative difference: 1.369721e-16"
>
>
>    MS> So it would appear that in the first R example above, the  
> print()
>    MS> function has changed in a material fashion.
>
> Yes  ((though not with *my* vote...)).
> However, be aware that such calculations *are* platform
> dependent, and IIUC, you are now using OS X wheras you've used
> another platform previously, so some of the differences you see
> may not be from changes in R, but from changes in the platform
> you use.

> Back to the topic of print():
> Actually, also  format(<numeric>)  has changed similarly to my  
> chagrin.
> In older versions of R, you could ask it to give "too many" digits,
> but now it gives "too few" even for maximal 'digits'.
> {There is a good reason - which I don't recall - for the new behavior}
>
> With as.character() it was worse (in older R versions): it gave
> sometimes too little digits, sometimes too many, whereas now it
> is at least consistently giving "too little".
> But the effect is that in  ch <- as.character(x) ,
> ch may contain duplicated entries even for unique x,
> e.g., for x <- c(1, 1 + 4e-16)
>
> BTW, one alternative to {"my"}  formatC() is  sprintf(),
> and if you are really interested: The latest changes (in 2.10.0 R- 
> devel),
> ensuring unique factor levels actually now make use of
> 	 sprintf("%.17g", .)
> instead of as.character(.) exactly in order to ensure that
> different numbers map to different strings necessarily.
>
> BTW, we are way off topic for R-help, being in R-devel realm,
> but as this thread has started here, we may keep it...
>
> Martin Maechler, ETH Zurich
>

Thanks for replying Martin.

While I appreciate your comment above, I am moving to r-devel given  
the content. I agree that we are getting into low level subject matter.

FWIW, I grabbed my dusty old Dell laptop running Fedora 10 out of the  
closet and booted it up.

I get the same behavior as above there with R 2.8.1 patched.

So this would suggest that it it not an OS issue, but indeed a change  
in R.

I did try to build R 1.7.1 (the version used in the prior examples  
almost 6 years ago) on OSX, but it would appear that things have  
changed sufficiently in the intervening time frame as to preclude a  
successful build. I suspect much of the issue may be that Apple moved  
to Intel CPU's only about 4 years ago, so perhaps the configuration of  
older versions of R on OSX for Intel would require much work which is  
not worth it here. I would of course defer to others with more in- 
depth knowledge on that point.

I did not see anything in any of the *NEWS files, but the help for  
print() does reference:

Warning
Using too large a value of digits may lead to representation errors in  
the calculation of the number of significant digits and the decimal  
representation: these are likely for digits >= 16, and these possible  
errors are taken into account in assessing the numher of significant  
digits to be printed in that case.

Whereas earlier versions of R might have printed further digits for  
digits >= 16 on some platforms, they were not necessarily reliable.

While I don't want to re-visit what from your comments appears to be a  
sensitive subject, I do want to point out that this new behavior  
arguably masks aspects of the original subject matter of the thread  
from users. It also results in inconsistent behavior when compared to  
the output of the other floating point comparisons I used, which  
suggest that the result of the operation is not an integer, which will  
serve to further confuse folks.

Is there some reasonable compromise to be had here such that  
consistent and predictable behavior is possible in this realm,  
especially given how frequently this fundamental subject comes up?

We of course don't need examples as complicated as the one above and  
can use the more common:

 > print(0.5 - 0.4, 20)

[1] 0.1

 > 0.5 - 0.4 == 0.1

[1] FALSE

 > all.equal(0.5 - 0.4, 0.1, 0)

[1] "Mean relative difference: 2.775558e-16"

So arguably, we are talking about boundary situations here.

Thanks Martin!

Marc