[R] Bug in "is" ?

Fri Sep 26 00:12:18 CEST 2008

On Thu, Sep 25, 2008 at 5:07 PM, Douglas Bates <bates at stat.wisc.edu> wrote:
> On Thu, Sep 25, 2008 at 4:23 PM, Wacek Kusnierczyk
> <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:
>> Rolf Turner wrote:
>>>
>>> On 26/09/2008, at 1:27 AM, Petr PIKAL wrote:
>>>
>>>> Hi
>>>>
>>>> Sorry but I can not agree. If you measure something and your values in a
>>>> vector are
>>>>
>>>> c(5.1, 5.4, 4.8, 5.0)
>>>>
>>>> do you think the first three numbers shall be double and the last one
>>>> integer? Why? It is just that the reading is not precise enough for the
>>>> last value to be let say 5.02.
>>>>
>>>> I understand that you can assume that with making an assignment x <-
>>>> 5 you
>>>> put an integer number to x but you just put a number to it. It is
>>>> integer
>>>> in respect that it does not have a fractional part (and it can be tested
>>>> on this feature) but it is not a class integer
>>>>
>>>> Computer minds has limits and I prefer calculation to produce results
>>>> instead of NA
>>>>
>>>> try
>>>>  x<-2*10^9
>>>> x
>>>> is.integer(x)
>>>> x.i<-as.integer(x)
>>>> is.integer(x.i)
>>>>
>>>> y<-x.i+x.i
>>>> Warning message:
>>>> In x.i + x.i : NAs produced by integer overflow
>>>>> y
>>>> [1] NA
>>>>> y<-x+x
>>>>> y
>>>> [1] 4e+09
>>>
>>> This is indeed a compelling and convincing example.
>> the fact that r is not able to appropriately handle integer values such
>> as 2*10^9 + 2*10^9 is an argument for storing such numbers as
>> non-integers?  it is perhaps compelling and convincing if you already
>> have a rather negative opinion on the language.  integer overflow?  are
>> we programming in c?
>
> I'm losing track of the argument here.  Are you claiming that the
> calculation should be done in integer arithmetic, which means 32-bit
> signed integers in most (all?) implementations of R, or that it should
> not be done in floating point arithmetic, which means double precision
> floating point arithmetic?

Rats.  I edited that sentence and managed to create a double negative.
 Just drop the "not" in that last sentence.

> If you don't do something to force integer arithmetic then the
> computation of 2*10^9 + 2*10^9 quietly hums along and produces the
> expected result.  The value is an integer but it is stored as a double
> precision floating point number, which is a good thing because it
> can't be represented as a 32-bit signed integer.  I think that this
> does what you want in that it produces a representation of the correct
> answer without your having to pay attention to the representation.
> The only time that you get into trouble on this calculation is when
> you use techniques to force integer arithmetic to be used.
>
> It is true that is.integer(2*10^9 + 2*10^9) returns FALSE but as
> Kernighan and Plauger explained about floating point arithmetic "10
> times 0.1 is hardly ever 1". (I was surprised to find that it is, in
> fact, 1 in the versions of R that I tried, but that is another story.)
> Functions in a programming language can only return values based on
> the computer representation of the number, not on the mathematical
> concept of the number.
>
> As for the question of the bug in "is" - although I hate to sound like
> a politician who was given the nickname "Slick Willie" (see Clinton,
> William J.), "it depends what your definition of `is' is."  The
> definition in R is a function that examines the class of an object and
> reports on whether the object has the given class or is of a class
> that inherits from the given class.  In this case it will give an
> answer based on the class of the computer representation of the
> number, not the mathematical concept of the number.
>
> If you want to check for whether you have a number that is exactly an
> integer you should probably use
>
> x == round(x)
>
> That will work over a more extended range than will
>
> x == as.integer(x)
>
> because the former uses a double precision representation.
>
>> try ruby: x = 2*10**9; Integer(x+x) == x+x
>> try python: x = 2*10**9; isinstance(x+x, int)
>> try perl, try octave, try mathematica, try ... (well, quite many
>> languages do what R does)
>>
>> there are languages that handle this so that the user does not have to
>> think about how the computer cannot think.  "Computer minds has limits
>> and I prefer calculation to produce results
>> instead of NA" -- sure, so why not have R calculate instead of returning
>> the silly NA?
>>
>>
>>> However it is to some
>>> extent off the point.  The problem is that the vast majority of users
>>> would
>>> expect a function named is.integer() to reveal whether values are
>>> integers
>>> --- whole numbers --- in the usual sense, to within some numerical
>>> tolerance.
>>> They would not expect it to reveal some ever-so-slightly arcane
>>> information
>>> about the *storage mode* of the values.
>>>
>>> Admittedly the help page for is.integer() tells you this ``sort of''.
>>> But only
>>> sort of.  Explicitly it says:
>>>
>>>      'is.integer' returns 'TRUE' or 'FALSE' depending on whether its
>>>      argument is of integer type or not, unless it is a factor when it
>>>      returns 'FALSE'.
>>>
>>> Now what on earth does ``integer type'' mean?  The concept ``type'' is
>>> not defined
>>> anywhere, and there is no help on ``type''.  There is no type()
>>> function.  One
>>> has to intuit, from the discussion of integer vectors existing so that
>>> they
>>> can be properly passed to .C() or .Fortran(), that type has something
>>> to do
>>> with storage mode.
>>
>> indeed.  one more example that R man pages are often rather
>> uninformative, despite verbosity.
>>
>>>
>>> It would have been better to have called the function now known as
>>> ``is.integer()''
>>> something like ``is.storedAsInteger()'' and have a function
>>> is.integer() which
>>> does what people expect.  E.g.
>>>
>>>     is.integer(c(5.1, 5.4, 4.8, 5.0))
>>>
>>> would return
>>>
>>>     [1] FALSE FALSE FALSE TRUE
>>
>>>
>>> Despite what fortune(85) says, it is not unreasonable to expect
>>> functions to
>>> do what one would intuitively think that they should do.  E.g. sin(x)
>>> should not return
>>> 1/(1+x^2) even if the help page for sin() says clearly and explicitly
>>> that this
>>> is what it does.  (Aside:  help pages rarely if ever say *anything*
>>> clearly and
>>> explicitly, at least from the point of view of the person who does not
>>> already
>>> understand everything about the concepts being explained.)
>>
>> indeed.  one more opinion that R man pages are often rather
>> uninformative, despite verbosity.
>>
>>
>>>
>>> Be that as it may, this all academic maundering.  The is.integer()
>>> function
>>> does what it does and THAT IS NOT GOING TO CHANGE.  We'll just have to
>>> deal
>>> with it.  Once one is *aware* that the results of is.integer are
>>> counter-intuitive,
>>> one can adjust one's expectations, and it's no big deal.
>>>
>>> I do think, however, that there ought to a WARNING section in the help on
>>> is.integer() saying something like:
>>>
>>>     NOTE: is.integer() DOES NOT DO what you expect it to do.
>>
>> hehe.  this should be printed on the first page in every R tutorial:
>> "NOTE: R DOES NOT DO what you expect it to do" (seems i'm in a bad mood,
>> sorry, R is just fine).
>>
>>>
>>> In large friendly letters.
>>>
>>>     cheers,
>>>
>>>         Rolf Turner
>>>
>>> P. S.  Those who want a function that does what one would naturally
>>> expect
>>> is.integer() to do could define, e.g.
>>>
>>>     is.whole.number <- function(x) {
>>>         abs(x-round(x)) < sqrt(.Machine$double.eps)
>>>     }
>>>
>>> Then
>>>
>>>         is.whole.number(c(5.1, 5.4, 4.8, 5.0))
>>>
>>> returns
>>>
>>>     [1] FALSE FALSE FALSE TRUE
>>>
>>> just as one would want, hope, and expect.
>>
>> if *this* is what one would want, hope, and expect from is.integer, why
>> can't we want, hope, and expect that it eventually happens?
>>
>> vQ
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>