[R] Odp: precision issue?
Duncan Murdoch
murdoch at stats.uwo.ca
Thu Mar 4 14:35:42 CET 2010
On 04/03/2010 7:35 AM, (Ted Harding) wrote:
> On 04-Mar-10 10:50:56, Petr PIKAL wrote:
> > Hi
> >
> > r-help-bounces at r-project.org napsal dne 04.03.2010 10:36:43:
> >> Hi R Gurus,
> >>
> >> I am trying to figure out what is going on here.
> >>
> >> > a <- 68.08
> >> > b <- a-1.55
> >> > a-b
> >> [1] 1.55
> >> > a-b == 1.55
> >> [1] FALSE
> >> > round(a-b,2) == 1.55
> >> [1] TRUE
> >> > round(a-b,15) == 1.55
> >> [1] FALSE
> >>
> >> Why should (a - b) == 1.55 fail when in fact b has been defined
> >> to be a - 1.55? Is this a precision issue? How do i correct this?
> >
> > In real world those definitions of b are the same but not in computer
> > world. See FAQ 7.31
> >
> > Use either rounding or all.equal.
> >
> >> all.equal(a-b, 1.55)
> > [1] TRUE
> >
> > To all, this is quite common question and it is documented in FAQs.
> > programs, therefore maybe a confusion from novices.
> >
> > I wonder if there could be some type of global option which will
> > get rid of these users mistakes or misunderstandings by setting
> > some threshold option for equality testing by use "==".
> >
> > Regards
> > Petr
> >
> >> Alex
>
> Interesting suggestion, but in my view it would probably give
> rise to more problems than it would avoid!
>
> The fundamental issue is that many inexperienced users are not
> aware that once 68.08 has got inside the computer (as used by
> R and other programs which do fixed-length binary arithmetic)
> it is no longer 68.08 (though 1.55 is still 1.55).
>
I think most of what you write above and below is true, but your
parenthetical remark is not. 1.55 can't be represented exactly in the
double precision floating point format used in R. It doesn't have a
terminating binary expansion, so it will be rounded to a binary
fraction. I believe you can see the binary expansion using this code:
x <- 1.55
for (i in 1:70) {
cat(floor(x))
if (i == 1) cat(".")
x <- 2*(x - floor(x))
}
which gives
1.100011001100110011001100110011001100110011001100110100000000000000000
Notice how it becomes a repeating expansion, with 0011 repeated 12
times, but then it finishes with 0100000000000000000, because we've run
out of bits.
So 1.55 is actually stored as a number which is ever so slightly bigger.
Duncan Murdoch
> Since "==" tests for equality of stored binary representations,
> it is inevitable that it will often return FALSE when the user
> would "logically" expect TRUE (as in Alex's query above). When
> a naive users encounters this, and is led to raise a query on
> the list, a successful reply will have the effect that yet one
> more user has learned something. This is useful.
>
> As the help page ?Comparison (also accessible from ?"==" etc.)
> states:
>
> Do not use '==' and '!=' for tests, such as in 'if' expressions,
> where you must get a single 'TRUE' or 'FALSE'. Unless you are
> absolutely sure that nothing unusual can happen, you should use
> the 'identical' function instead.
>
> For numerical and complex values, remember '==' and '!=' do not
> allow for the finite representation of fractions, nor for rounding
> error. Using 'all.equal' with 'identical' is almost always
> preferable. See the examples.
>
> It can on occasion be useful to be able to test for exact equality
> of internal binary representations. Also, what is really going on
> when "==" appears to fail can be ascertained by evaluating the
> within-computer difference between allegedly equivalent expressions.
> Thus, for instance:
>
> a <- 68.08
> b <- a-1.55
> a-b == 1.55
> # [1] FALSE
> (a-b) - 1.55
> # [1] -2.88658e-15
>
> all.equal((a-b), 1.55)
> # [1] TRUE
>
> I think that introducing a default "tolerance" option to "==",
> thus making it masquerade as all.equal(), would both suppress
> the capability of "==" to test exact equality of internal
> representations, and contribute to persistence of misconceptions
> by naive users. At least the current situation contributes to
> removing the latter.
>
> The result of the latter could well be that a complicated
> program goes astray because a deep-lying use of "==" gives
> the "equal within tolerance" result rather than the "not
> exactly equal" result, leading to some really difficult
> queries to the list!
>
> Of course, the use of "all.equal((a-b), 1.55)" is a lot longer
> than the use of "(a-b) == 1.55", and there is an understandable
> argument for something as snappy as "==" for testing approximate
> equality. But I think that subverting "==" is the wrong way to
> go about it. Leave "==" alone and, according to taste, introduce
> a new operator (which the user can define for himself anyway),
> say "%==%":
>
> "%==%" <- function(x,y){ all.equal(x,y) }
> (a-b) %==% 1.55
> # [1] TRUE
>
> Or perhaps
>
> "%==%" <- function(x,y){ identical(all.equal(x,y),True) }
>
> as in the Example in ?Comparison. Then you have a snappy shorthand,
> and it operates exactly as all.equal() does and with the same
> tolerance, and it won't break anything else as a result of
> subverting "==".
>
> Ted.
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
> Fax-to-email: +44 (0)870 094 0861
> Date: 04-Mar-10 Time: 12:35:08
> ------------------------------ XFMail ------------------------------
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list