[Rd] Accuracy Bug (PR#1228), (PR#6743)
maechler at stat.math.ethz.ch
maechler at stat.math.ethz.ch
Tue Apr 6 12:10:45 CEST 2004
>>>>> "daheiser" == daheiser <daheiser at gvn.net>
>>>>> on Tue, 6 Apr 2004 04:24:35 +0200 (CEST) writes:
daheiser> It is an error in the algorithm.
"it" being the behavior reported in bug report PR#1228 ---
[too bad you didn't use the whole string "PR#1228" in your subject;
if you had, no new report would have been created, and things
would have gone to the proper place in the R-bugs repository ...]
namely the fact that
var(rep(1e30, n))
does not always (for all n) give 0. With the following function
tst.var <- function(x,nset= 2:100) {
for(n in nset) {
v <- var(rep(x,n))
if(v != 0) cat(sprintf("%4d: %20.12g\n", n,v))
}
}
Actually, on my current desktop (AMD Athlon,Linux, R 1.9.0beta)
the computations seem to be more often exact than they used to:
Even tst.var(1e30, 2:5000) doesn't produce any output nor do a few other 'x'
arguments I tried --- but the fundamental criticism is still
correct, and even on the Athlon, it's easy to find cases where
tst.var() *does* produce output.
daheiser> It is an excellent example of how errors and
daheiser> faults occur when the programmer follows the
daheiser> mathematical formula exactly. Welford's algorithm
daheiser> does not produce this error. It gives correct
daheiser> standard deviation and variance values.
Actually, after reading
@article{ChaTL97,
author = {Tony F. Chan and John Gregg Lewis},
title = {Computing standard deviations: accuracy},
journal = {Commun. ACM},
volume = 22,
number = 9,
year = 1979,
issn = {0001-0782},
pages = {526--531},
doi = {http://doi.acm.org/10.1145/359146.359152},
publisher = {ACM Press},
}
@article{WesD97,
author = {D. H. D. West},
title = {Updating mean and variance estimates: an improved method},
journal = {Commun. ACM},
volume = 22,
number = 9,
year = 1979,
issn = {0001-0782},
pages = {532--535},
doi = {http://doi.acm.org/10.1145/359146.359153},
publisher = {ACM Press},
}
I'd conclude (from page 531) that Welford's algorithm is a bit
less accurate than the (very similar) "West" version,
and we (the R developers) should rather implement the latter.
Maybe for R 1.9.1 -- or even later {there are even more
important numerical accuracy problems I know about!}
Martin Maechler <maechler at stat.math.ethz.ch> http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum LEO C16 Leonhardstr. 27
ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND
phone: x-41-1-632-3408 fax: ...-1228 <><
More information about the R-devel
mailing list