[R] Best way to compute a sum

Thu Jun 24 22:49:35 CEST 2010

On Jun 24, 2010, at 4:08 PM, Lasse Kliemann wrote:

>> a <- 0 ; for(i in (1:200000000)) a <- a + 1/i
>> b <- 0 ; for(i in (200000000:1)) b <- b + 1/i
>> c <- sum(1/(1:200000000))
>> d <- sum(1/(200000000:1))
>> order(c(a,b,c,d))
>  [1] 1 2 4 3
>> b<c
>  [1] TRUE
>> c==d
>  [1] FALSE
>
> I'd expected b being the largest, since we sum up the smallest
> numbers first.

Can you explain? I would have assumed the floating point algorithms  
would incorporate some sort of sensible rounding strategy that would  
prevent consistent underflow.

> Instead, c is the largest, which is sum() applied
> to the vector ordered with largest numbers first.
>
> Can anyone shed some light on this?
 > sum(1/(200000000:1))
[1] 19.69104
 > sum(1/(1:200000000))
[1] 19.69104
 > b <- 0 ; for(i in (200000000:1)) b <- b + 1/i
 > b
[1] 19.69104
 > sum(1/(1:200000000)) - sum(1/(200000000:1))
[1] 7.105427e-15
 > sum(1/(200000000:1)) -b
[1] 1.673328e-12
 > sum(1/(1:200000000)) -b
[1] 1.680434e-12
 > .Machine$double.eps
[1] 2.220446e-16
 > .Machine$double.eps ^ 0.5
[1] 1.490116e-08

>
> What is the best way in R to compute a sum while avoiding
> cancellation effects?
>
> By the way, sum() in the above example is much faster than the
> loops, so it would be nice if we could utilize it.

Why wouldn't you? Generally differences as small as you are observing  
are either meaningless or unavoidable using double precision  
arithmetic. If you need exact arithmetic, then pick an appropriate  
platform.

-- 
David.