[R] bug when subtracting decimals?

Martin Maechler maechler at stat.math.ethz.ch
Wed Apr 22 11:49:27 CEST 2009


>>>>> "MS" == Marc Schwartz <marc_schwartz at me.com>
>>>>>     on Tue, 21 Apr 2009 08:06:46 -0500 writes:

    MS> On Apr 21, 2009, at 5:55 AM, Duncan Murdoch wrote:
    >> On 21/04/2009 3:48 AM, Petr PIKAL wrote:
    >>> Hi
    >>> r-help-bounces at r-project.org napsal dne 20.04.2009 19:01:46:
    >>>> wolfgang.siewert <wolfgang.siewert <at> gmail.com> writes:
    >>>> 
    >>>>> There is a way around: round(0.7-0.3,1)==0.4
    >>>>> (TRUE)
    >>>>> 
    >>>>> Obviously there is a problem with some combinations of decimal
    >>> subtractions,
    >>>>> that - we have the feeling - shouldt be solved.
    >>>> Oh no, not that one again! This was lecture two in my first computer
    >>>> course in 1968, but it seems to be gone the way of the dodo since  
    >>>> than.
    >>> Maybe that is because of Excel is so widespread now and gives  
    >>> expected results (it probably silently rounds all decimal numbers  
    >>> before calculation).
    >> 
    >> I don't have Excel, but I expect OpenOffice duplicates its bugs  
    >> pretty well.  And in OpenOffice I see all sorts of bugs due to this,  
    >> e.g. examples where x = y and y = z but x != z, cases where I can  
    >> calculate a number like 1 + 4.e-15 and get something different from  
    >> 1, but if I enter it directly as 1.000000000000004, it gets changed  
    >> to 1.
    >> 
    >> So it only gives expected results in some tests, not others.
    >> 
    >> Duncan Murdoch



    MS> As Dieter noted from our offlist exchange, this had been discussed  
    MS> previously back in 2003. Just to refresh memories:

    MS> https://stat.ethz.ch/pipermail/r-help/2003-June/034565.html

    MS> https://stat.ethz.ch/pipermail/r-help/2003-June/034860.html


    MS> OO.org has replicated Excel's behavior to a fault.  Thus:

    MS> Spreadsheet Use -> Brain to Porridge


    MS> Just to update OO.org's behavior using version 3.0.1 on OSX:

    MS> Formula: =4.145 * 100 + 0.5     Result: 415.00000000000000000000

    MS> Formula: =0.5 - 0.4 - 0.1       Result: 0.00000000000000000000

    MS> Formula: =(0.5 - 0.4 - 0.1)     Result: 0.00000000000000000000

    MS> So nothing has changed in OO.org in five years.  Somebody with Excel  
    MS> 2007 might want to try the 2nd and 3rd formula examples to see if  
    MS> using parens still makes a difference in the result as compared to the  
    MS> formula without the parens.


    MS> FWIW, now that I am on OSX, I can add the following output using  
    MS> Numbers '09:

    MS> Formula: =4.145 * 100 + 0.5     Result: 415.00000000000000000000

    MS> Formula: =0.5 - 0.4 - 0.1       Result: -2.77556E-17

    MS> Formula: =(0.5 - 0.4 - 0.1)     Result: -2.77556E-17


    MS> It does look like R's behavior has changed since then. Using:

    MS> R version 2.9.0 Patched (2009-04-18 r48348)

    MS> on OSX:

    MS> # This first example has changed.
    MS> # Prior result was 414.99999999999994
    >> print(4.145 * 100 + 0.5, digits = 20)
    MS> [1] 415

    >> formatC(4.145 * 100 + 0.5, format = "E", digits = 20)
    MS> [1] "4.14999999999999943157E+02"

    >> print(0.5 - 0.4 - 0.1, digits = 20)
    MS> [1] -2.77555756156289e-17

    >> formatC(0.5 - 0.4 - 0.1, format = "E", digits = 20)
    MS> [1] "-2.77555756156289135106E-17"


    MS> What is interesting is that:

    >> 4.145 * 100 + 0.5 == 415
    MS> [1] FALSE

    >> (4.145 * 100 + 0.5) - 415
    MS> [1] -5.684342e-14

    >> all.equal(4.145 * 100 + 0.5, 415, 0)
    MS> [1] "Mean relative difference: 1.369721e-16"


    MS> So it would appear that in the first R example above, the print()  
    MS> function has changed in a material fashion.

Yes  ((though not with *my* vote...)).
However, be aware that such calculations *are* platform
dependent, and IIUC, you are now using OS X wheras you've used
another platform previously, so some of the differences you see
may not be from changes in R, but from changes in the platform
you use.

Back to the topic of print():
Actually, also  format(<numeric>)  has changed similarly to my chagrin.
In older versions of R, you could ask it to give "too many" digits,
but now it gives "too few" even for maximal 'digits'.
{There is a good reason - which I don't recall - for the new behavior}

With as.character() it was worse (in older R versions): it gave
sometimes too little digits, sometimes too many, whereas now it
is at least consistently giving "too little".
But the effect is that in  ch <- as.character(x) ,
ch may contain duplicated entries even for unique x,
e.g., for x <- c(1, 1 + 4e-16)

BTW, one alternative to {"my"}  formatC() is  sprintf(), 
and if you are really interested: The latest changes (in 2.10.0 R-devel),
ensuring unique factor levels actually now make use of
	 sprintf("%.17g", .)
instead of as.character(.) exactly in order to ensure that
different numbers map to different strings necessarily.

BTW, we are way off topic for R-help, being in R-devel realm,
but as this thread has started here, we may keep it...

Martin Maechler, ETH Zurich

    MS> HTH,
    MS> Marc Schwartz




More information about the R-help mailing list