[R] Strange behaviour of as.integer()

Thu Jan 7 14:32:34 CET 2010

On 07-Jan-10 12:31:42, Ulrich Keller wrote:
> I have encountered a strange behaviour of as.integer() which
> does not seem correct to me. Sorry if this is just an indication
> of me not understanding floating point arithmetic.

I'm afraid it probably is -- but being aware of what the problem
is, is 0.875 of solving it (sticking to binary-compatible fractions)!
See below.

>> .57 * 100
> [1] 57
>> .29 * 100
> [1] 29

So it seems, but:

  57 - .57 * 100
  # [1] 7.105427e-15
  (.57 * 100 < 57)
  # [1] TRUE

So things are not what they seem. Now:

> So far, so good. But:
> 
>> as.integer(.57 * 100)
> [1] 56
>> as.integer(.29 * 100)
> [1] 28

But if you look at ?as.integer you see:

  "Non-integral numeric values are truncated towards zero
   (i.e., ?as.integer(x)? equals ?trunc(x)? there)"

so since .57 * 100 is stored as the equivalent of 56.999<something>
its fractional part i discarded, resulting in 56.

> Then again:
> 
>> all.equal(.57 * 100, as.integer(57))
> [1] TRUE
>> all.equal(.29 * 100, as.integer(29))
> [1] TRUE

And now you should also read ?all.equal:

  "'all.equal(x,y)' is a utility to compare R objects 'x' and 'y'
   testing 'near equality'.
   [...]
   Usage:
   [...]
   all.equal(target, current,
             tolerance = .Machine$double.eps ^ 0.5,
             scale = NULL, check.attributes = TRUE, ...)
   [...]
   tolerance: numeric >= 0.  Differences smaller than 'tolerance'
   are not considered."

Now, on my R,

  .Machine$double.eps ^ 0.5
  # [1] 1.490116e-08

whereas (see above) (57 - .57 * 100) = 7.105427e-15, which is smaller
than .Machine$double.eps ^ 0.5.

> This behaviour is the same in R 2.10.1 (Ubuntu and Windows) and 2.9.2
> (Windows), all 32 bit versions. Is this really intended?

Yes! And, as you suspect, it is all down to the binary representation
of fractional numbers input as decimal. There is no finite-length
binary fraction which is exactly equal to 0.57[decimal].

If there were, then for some power k of 2 (2^k)*0.57 would be an
exact integer. You can easily verify that this is not the case.
Just keep doubling 0.57: the series starts as

  0.57  1.14  2.28  4.56  9.12  18.24  ...

and finally, at the 23rd position, you get 2390753.28 and you are
now back at the "****.28" fractional part (as at position 3 above).
Hence the fractional parts will cycle through .28, .56, .12, ...
forever, so there is no exact binary representation of 0.57.

To be absolutely sure of it, you should do it by hand on paper,
(lest you tickle rounding errors in R) but R will in fact give
you the sequence:

  0.57*2^(0:24)
  #  [1]         0.57         1.14        *2.28*        4.56
  #  [5]         9.12        18.24        36.48        72.96     
  #  [9]       145.92       291.84       583.68      1167.36     
  # [13]      2334.72      4669.44      9338.88     18677.76    
  # [27]     37355.52     74711.04    149422.08    298844.16
  # [21]    597688.32   1195376.64  *2390753.28*  4781506.56

Hoping this helps!
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 07-Jan-10                                       Time: 13:32:31
------------------------------ XFMail ------------------------------