[Rd] double in summary.c : isum

Mon Mar 25 12:42:08 CET 2013

On 25.03.2013 11:31, Matthew Dowle wrote:
> On 25.03.2013 11:27, Matthew Dowle wrote:
>> On 25.03.2013 09:20, Prof Brian Ripley wrote:
>>> On 24/03/2013 15:01, Duncan Murdoch wrote:
>>>> On 13-03-23 10:20 AM, Matthew Dowle wrote:
>>>>> On 23.03.2013 12:01, Prof Brian Ripley wrote:
>>>>>> On 20/03/2013 12:56, Matthew Dowle wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Please consider the following :
>>>>>>>
>>>>>>>> x = as.integer(2^30-1)
>>>>>>> [1] 1073741823
>>>>>>>> sum(c(rep(x, 10000000), rep(-x,9999999)))
>>>>>>> [1] 1073741824
>>>>>>>
>>>>>>> Tested on 2.15.2 and a recent R-devel (r62132).
>>>>>>>
>>>>>>> I'm wondering if s in isum could be LDOUBLE instead of double, 
>>>>>>> like
>>>>>>> rsum, to fix this edge case?
>>>>>>
>>>>>> No, because there is no guarantee that LDOUBLE differs from 
>>>>>> double
>>>>>> (and platform on which it does not).
>>>>>
>>>>> That's a reason for not using LDOUBLE at all isn't it? Yet 
>>>>> src/main/*.c
>>>>> has 19 lines using LDOUBLE e.g. arithmetic.c and cum.c as well as
>>>>> summary.c.
>>>>>
>>>>> I'd assumed LDOUBLE was being used by R to benefit from long 
>>>>> double (or
>>>>> equivalent) on platforms that support it (which is all modern 
>>>>> Unix, Mac
>>>>> and Windows as far as I know). I do realise that the edge case 
>>>>> wouldn't
>>>
>>> Actually, you don't know.  Really only on almost all Intel ix86: 
>>> most
>>> other current CPUs do not have it in hardware.  C99/C11 require 
>>> long
>>> double, but does not require the accuracy that you are thinking of 
>>> and
>>> it can be implemented in software.
>>
>> This is very interesting, thanks. Which of the CRAN machines don't
>> support LDOUBLE with higher accuracy than double, either in hardware
>> or software?  Yes I had assumed that all CRAN machines would do. It
>> would be useful to know for something else I'm working on as well.
>>
>>>>> be fixed on platforms where LDOUBLE is defined as double.
>>>>
>>>> I think the problem is that there are two opposing targets in R:  
>>>> we
>>>> want things to be as accurate as possible, and we want them to be
>>>> consistent across platforms. Sometimes one goal wins, sometimes 
>>>> the
>>>> other.  Inconsistencies across platforms give false positives in 
>>>> tests
>>>> that tend to make us miss true bugs.  Some people think we should 
>>>> never
>>>> use LDOUBLE because of that.  In other cases, the extra accuracy 
>>>> is so
>>>> helpful that it's worth it.  So I think you'd need to argue that 
>>>> the
>>>> case you found is something where the benefit outweighs the costs. 
>>>> Since
>>>> almost all integer sums are done exactly with the current code, is 
>>>> it
>>>> really worth introducing inconsistencies in the rare inexact 
>>>> cases?
>>>
>>> But as I said lower down, a 64-bit integer accumulator would be
>>> helpful, C99/C11 requires one at least that large and it is
>>> implemented in hardware on all known R platforms.  So there is a 
>>> way
>>> to do this pretty consistently across platforms.
>>
>> That sounds much better. Is it just a matter of changing s to be
>> declared as uint64_t?
>
> Typo. I meant int64_t.

But even 64-bit integer might under or overflow. Which is one of the 
reasons for accumulating in double (or LDOUBLE) isn't it? To save a
test for over/underflow on each iteration.

>
>>
>>>>
>>>> Duncan Murdoch
>>>>
>>>>
>>>>>
>>>>> What have I misunderstood?
>>>>>
>>>>>>
>>>>>> Users really need to take responsibility for the numerical 
>>>>>> stability
>>>>>> of calcuations they attempt.  Expecting to sum 20 million large
>>>>>> numbers exactly is unrealistic.
>>>>>
>>>>> Trying to take responsibility, but you said no. Changing from 
>>>>> double to
>>>>> LDOUBLE would mean that something that wasn't realistic, was then
>>>>> realistic (on platforms that support long double).
>>>>>
>>>>> And it would bring open source R into line with TERR, which gets 
>>>>> the
>>>>> answer right, on 64bit Windows at least. But I'm not sure I 
>>>>> should be as
>>>>> confident in TERR as I am in open source R because I can't see 
>>>>> its
>>>>> source code.
>>>>>
>>>>>
>>>>>> There are cases where 64-bit integer accumulators would be
>>>>>> beneficial, and this is one.  Unfortunately C11 does not require 
>>>>>> them
>>>>>> but some optional moves in that direction are planned.
>>>>>>
>>>>>>>
>>>>>>> https://svn.r-project.org/R/trunk/src/main/summary.c
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Matthew
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> R-devel at r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>
>>>>> ______________________________________________
>>>>> R-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>
>>>>