[R] strange thing with sd

Mon Mar 29 20:10:53 CEST 2004

> sd(rep(0.01,      15))  #0
> sd(rep(0.001,     15))  #4.489023e-19
> sd(rep(0.00001,   15))  #0
> sd(rep(0.00000001,15))  #1.712427e-24
> 
> sd(rep(0.01,      13))  #1.805557e-18
> sd(rep(0.001,     13))  #4.513894e-19
> sd(rep(0.00001,   13))  #0
> sd(rep(0.00000001,13))  #0
> 
> sd(rep(5.01,	  15))  #0
> sd(rep(5.001,	  15))  #4.489023e-19
> sd(rep(5.00001,   15))  #1.838704e-15
> sd(rep(5.00000001,15))  #9.19352e-16
> 
> sd(rep(5.01,	  13))  #9.244454e-16
> sd(rep(5.001,	  13))  #9.244454e-16
> sd(rep(5.00001,   13))  #1.848891e-15
> sd(rep(5.00000001,13))  #0

Before we get too carried away with this thread could you all please
consider how the sd function calculates its result?  Doing so makes it
a lot easier to decide why the result will be zero in some cases and a
number very close to zero in other cases.  [Sorry if I sound testy but
one of the big advantages of Open Source software is that you don't
need to speculate on how it arrives at a result, you can actually
check.]

I'll tell you, it takes the square root of the variance.  How is the
variance calculated for a numeric vector?  First you calculate the
mean *using floating point arithmetic* in which it is not necessarily
true that N * k / N == k, or, as it is done in this case,
  [k + k + ... + k]/N == k
where there are N terms in the sum.

There can be round-off error in floating point calculations, which is
why you shouldn't expect exact answers.

Development versions of R are subjected to extensive tests every day,
including tests on the numerical accuracy.  Most of those tests end in
a check using the all.equal function which checks if the relative
difference is less than a threshold.  That's about the best that you
can do with floating point arithmetic.

Here endeth the sermon.