[R] Bug in "is" ?
Douglas Bates
bates at stat.wisc.edu
Fri Sep 26 00:07:34 CEST 2008
On Thu, Sep 25, 2008 at 4:23 PM, Wacek Kusnierczyk
<Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:
> Rolf Turner wrote:
>>
>> On 26/09/2008, at 1:27 AM, Petr PIKAL wrote:
>>
>>> Hi
>>>
>>> Sorry but I can not agree. If you measure something and your values in a
>>> vector are
>>>
>>> c(5.1, 5.4, 4.8, 5.0)
>>>
>>> do you think the first three numbers shall be double and the last one
>>> integer? Why? It is just that the reading is not precise enough for the
>>> last value to be let say 5.02.
>>>
>>> I understand that you can assume that with making an assignment x <-
>>> 5 you
>>> put an integer number to x but you just put a number to it. It is
>>> integer
>>> in respect that it does not have a fractional part (and it can be tested
>>> on this feature) but it is not a class integer
>>>
>>> Computer minds has limits and I prefer calculation to produce results
>>> instead of NA
>>>
>>> try
>>> x<-2*10^9
>>> x
>>> is.integer(x)
>>> x.i<-as.integer(x)
>>> is.integer(x.i)
>>>
>>> y<-x.i+x.i
>>> Warning message:
>>> In x.i + x.i : NAs produced by integer overflow
>>>> y
>>> [1] NA
>>>> y<-x+x
>>>> y
>>> [1] 4e+09
>>
>> This is indeed a compelling and convincing example.
> the fact that r is not able to appropriately handle integer values such
> as 2*10^9 + 2*10^9 is an argument for storing such numbers as
> non-integers? it is perhaps compelling and convincing if you already
> have a rather negative opinion on the language. integer overflow? are
> we programming in c?
I'm losing track of the argument here. Are you claiming that the
calculation should be done in integer arithmetic, which means 32-bit
signed integers in most (all?) implementations of R, or that it should
not be done in floating point arithmetic, which means double precision
floating point arithmetic?
If you don't do something to force integer arithmetic then the
computation of 2*10^9 + 2*10^9 quietly hums along and produces the
expected result. The value is an integer but it is stored as a double
precision floating point number, which is a good thing because it
can't be represented as a 32-bit signed integer. I think that this
does what you want in that it produces a representation of the correct
answer without your having to pay attention to the representation.
The only time that you get into trouble on this calculation is when
you use techniques to force integer arithmetic to be used.
It is true that is.integer(2*10^9 + 2*10^9) returns FALSE but as
Kernighan and Plauger explained about floating point arithmetic "10
times 0.1 is hardly ever 1". (I was surprised to find that it is, in
fact, 1 in the versions of R that I tried, but that is another story.)
Functions in a programming language can only return values based on
the computer representation of the number, not on the mathematical
concept of the number.
As for the question of the bug in "is" - although I hate to sound like
a politician who was given the nickname "Slick Willie" (see Clinton,
William J.), "it depends what your definition of `is' is." The
definition in R is a function that examines the class of an object and
reports on whether the object has the given class or is of a class
that inherits from the given class. In this case it will give an
answer based on the class of the computer representation of the
number, not the mathematical concept of the number.
If you want to check for whether you have a number that is exactly an
integer you should probably use
x == round(x)
That will work over a more extended range than will
x == as.integer(x)
because the former uses a double precision representation.
> try ruby: x = 2*10**9; Integer(x+x) == x+x
> try python: x = 2*10**9; isinstance(x+x, int)
> try perl, try octave, try mathematica, try ... (well, quite many
> languages do what R does)
>
> there are languages that handle this so that the user does not have to
> think about how the computer cannot think. "Computer minds has limits
> and I prefer calculation to produce results
> instead of NA" -- sure, so why not have R calculate instead of returning
> the silly NA?
>
>
>> However it is to some
>> extent off the point. The problem is that the vast majority of users
>> would
>> expect a function named is.integer() to reveal whether values are
>> integers
>> --- whole numbers --- in the usual sense, to within some numerical
>> tolerance.
>> They would not expect it to reveal some ever-so-slightly arcane
>> information
>> about the *storage mode* of the values.
>>
>> Admittedly the help page for is.integer() tells you this ``sort of''.
>> But only
>> sort of. Explicitly it says:
>>
>> 'is.integer' returns 'TRUE' or 'FALSE' depending on whether its
>> argument is of integer type or not, unless it is a factor when it
>> returns 'FALSE'.
>>
>> Now what on earth does ``integer type'' mean? The concept ``type'' is
>> not defined
>> anywhere, and there is no help on ``type''. There is no type()
>> function. One
>> has to intuit, from the discussion of integer vectors existing so that
>> they
>> can be properly passed to .C() or .Fortran(), that type has something
>> to do
>> with storage mode.
>
> indeed. one more example that R man pages are often rather
> uninformative, despite verbosity.
>
>>
>> It would have been better to have called the function now known as
>> ``is.integer()''
>> something like ``is.storedAsInteger()'' and have a function
>> is.integer() which
>> does what people expect. E.g.
>>
>> is.integer(c(5.1, 5.4, 4.8, 5.0))
>>
>> would return
>>
>> [1] FALSE FALSE FALSE TRUE
>
>>
>> Despite what fortune(85) says, it is not unreasonable to expect
>> functions to
>> do what one would intuitively think that they should do. E.g. sin(x)
>> should not return
>> 1/(1+x^2) even if the help page for sin() says clearly and explicitly
>> that this
>> is what it does. (Aside: help pages rarely if ever say *anything*
>> clearly and
>> explicitly, at least from the point of view of the person who does not
>> already
>> understand everything about the concepts being explained.)
>
> indeed. one more opinion that R man pages are often rather
> uninformative, despite verbosity.
>
>
>>
>> Be that as it may, this all academic maundering. The is.integer()
>> function
>> does what it does and THAT IS NOT GOING TO CHANGE. We'll just have to
>> deal
>> with it. Once one is *aware* that the results of is.integer are
>> counter-intuitive,
>> one can adjust one's expectations, and it's no big deal.
>>
>> I do think, however, that there ought to a WARNING section in the help on
>> is.integer() saying something like:
>>
>> NOTE: is.integer() DOES NOT DO what you expect it to do.
>
> hehe. this should be printed on the first page in every R tutorial:
> "NOTE: R DOES NOT DO what you expect it to do" (seems i'm in a bad mood,
> sorry, R is just fine).
>
>>
>> In large friendly letters.
>>
>> cheers,
>>
>> Rolf Turner
>>
>> P. S. Those who want a function that does what one would naturally
>> expect
>> is.integer() to do could define, e.g.
>>
>> is.whole.number <- function(x) {
>> abs(x-round(x)) < sqrt(.Machine$double.eps)
>> }
>>
>> Then
>>
>> is.whole.number(c(5.1, 5.4, 4.8, 5.0))
>>
>> returns
>>
>> [1] FALSE FALSE FALSE TRUE
>>
>> just as one would want, hope, and expect.
>
> if *this* is what one would want, hope, and expect from is.integer, why
> can't we want, hope, and expect that it eventually happens?
>
> vQ
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list