[R] strange thing with sd
Douglas Bates
bates at stat.wisc.edu
Mon Mar 29 20:10:53 CEST 2004
> sd(rep(0.01, 15)) #0
> sd(rep(0.001, 15)) #4.489023e-19
> sd(rep(0.00001, 15)) #0
> sd(rep(0.00000001,15)) #1.712427e-24
>
> sd(rep(0.01, 13)) #1.805557e-18
> sd(rep(0.001, 13)) #4.513894e-19
> sd(rep(0.00001, 13)) #0
> sd(rep(0.00000001,13)) #0
>
> sd(rep(5.01, 15)) #0
> sd(rep(5.001, 15)) #4.489023e-19
> sd(rep(5.00001, 15)) #1.838704e-15
> sd(rep(5.00000001,15)) #9.19352e-16
>
> sd(rep(5.01, 13)) #9.244454e-16
> sd(rep(5.001, 13)) #9.244454e-16
> sd(rep(5.00001, 13)) #1.848891e-15
> sd(rep(5.00000001,13)) #0
Before we get too carried away with this thread could you all please
consider how the sd function calculates its result? Doing so makes it
a lot easier to decide why the result will be zero in some cases and a
number very close to zero in other cases. [Sorry if I sound testy but
one of the big advantages of Open Source software is that you don't
need to speculate on how it arrives at a result, you can actually
check.]
I'll tell you, it takes the square root of the variance. How is the
variance calculated for a numeric vector? First you calculate the
mean *using floating point arithmetic* in which it is not necessarily
true that N * k / N == k, or, as it is done in this case,
[k + k + ... + k]/N == k
where there are N terms in the sum.
There can be round-off error in floating point calculations, which is
why you shouldn't expect exact answers.
Development versions of R are subjected to extensive tests every day,
including tests on the numerical accuracy. Most of those tests end in
a check using the all.equal function which checks if the relative
difference is less than a threshold. That's about the best that you
can do with floating point arithmetic.
Here endeth the sermon.
More information about the R-help
mailing list