[Rd] Discrepancy: R sum() VS C or Fortran sum

Fri Mar 16 18:34:39 CET 2018

R uses long double type for the accumulator (on platforms where it is 
available). This is also mentioned in ?sum:

"Where possible extended-precision accumulators are used, typically well 
supported with C99 and newer, but possibly platform-dependent."

Tomas

On 03/16/2018 06:08 PM, Pierre Chausse wrote:
> My simple functions were to compare the result with the gfortran 
> compiler sum() function.  I thought that the Fortran sum could not be 
> less precise than R. I was wrong. I am impressed. The R sum does in 
> fact match the result if we use the Kahan algorithm.
>
> P.
>
> I am glad to see that R sum() is more accurate than the gfortran 
> compiler sum.
>
> On 16/03/18 11:37 AM, luke-tierney at uiowa.edu wrote:
>> Install the gmp package, run your code, and then try this:
>>
>> bu <- gmp::as.bigq(u)
>> bs4 <- bu[1] + bu[2] + bu[3] + bu[4] + bu[5]
>> s4 <- as.double(bs4)
>> s1 - s4
>> ##  [1] 0
>> s2[[2]] - s4
>> ##  [1] 7.105427e-15
>> s3 - s4
>> ##  [1] 7.105427e-15
>> identical(s1, s4)
>> ##  [1] TRUE
>>
>> `bs4` is the exact sum of the binary rationals in your `u` vector;
>> `s4` is the closest double precision to this exact sum.
>>
>> Looking at the C source code for sum() will show you that it makes
>> some extra efforts to get a more accurate sum than your simple
>> version.
>>
>> Best,
>>
>> luke
>>
>> On Fri, 16 Mar 2018, Pierre Chausse wrote:
>>
>>> Hi all,
>>>
>>> I found a discrepancy between the sum() in R and either a sum done 
>>> in C or Fortran for vector of just 5 elements. The difference is 
>>> very small, but this is a very small part of a much larger numerical 
>>> problem in which first and second derivatives are computed 
>>> numerically. This is part of a numerical method course I am teaching 
>>> in which I want to compare speeds of R versus Fortran (We solve a 
>>> general equilibrium problem all numerically, if you want to know). 
>>> Because of this discrepancy, the Jacobian and Hessian in R versus in 
>>> Fortran are quite different, which results in the Newton method 
>>> producing a different solution (for a given stopping rule). Since 
>>> the solution computed in Fortran is almost identical to the 
>>> analytical solution, I suspect that the sum in Fortran may be more 
>>> accurate (That's just a guess).  Most of the time the sum produces 
>>> identical results, but for some numbers, it is different. The 
>>> following example, shows what happens:
>>>
>>> set.seed(12233)
>>> n <- 5
>>> a <- runif(n,1,5)
>>> e <- runif(n, 5*(1:n),10*(1:n))
>>> s <- runif(1, 1.2, 4)
>>> p <- runif(5, 3, 10)
>>> x <- c(e[-5], (sum(e*p)-sum(e[-5]*p[-5]))/p[5])
>>> u <- a^(1/s)*x^((s-1)/s)
>>> dyn.load("sumF.so")
>>>
>>> u[1] <- u[1]+.0001 ### If we do not add .0001, all differences are 0
>>> s1 <- sum(u)
>>> s2 <- .Fortran("sumf", as.double(u), as.integer(n), sf1=double(1),
>>>               sf2=double(1))[3:4]
>>> s3 <- .C("sumc", as.double(u), as.integer(n), sC=double(1))[[3]]
>>>
>>> s1-s2[[1]] ## R versus compiler sum() (Fortran)
>>>
>>> [1] -7.105427e-15
>>>
>>> s1-s2[[2]] ## R versus manual sum (Fortran
>>>
>>> [1] -7.105427e-15
>>>
>>> s1-s3 ## R Versus manual sum in C
>>>
>>> [1] -7.105427e-15
>>>
>>> s2[[2]]-s2[[1]] ## manual sum versus compiler sum() (Fortran)
>>>
>>> [1] 0
>>>
>>> s3-s2[[2]] ## Fortran versus C
>>>
>>> [1] 0
>>>
>>> My sumf and sumc are
>>>
>>>      subroutine sumf(x, n, sx1, sx2)
>>>      integer i, n
>>>      double precision x(n), sx1, sx2
>>>      sx1 = sum(x)
>>>      sx2 = 0.0d0
>>>      do i=1,n
>>>         sx2 = sx2+x(i)
>>>      end do
>>>      end
>>>
>>> void sumc(double *x, int *n, double *sum)
>>> {
>>>  int i;
>>>  double sum1 = 0.0;
>>>  for (i=0; i< *n; i++) {
>>>    sum1 += x[i];
>>>  }
>>>  *sum = sum1;
>>> }
>>>
>>> Can that be a bug?  Thanks.
>>>
>>>
>>
>