[Rd] Accuracy Bug (PR#1228), (PR#6743)

maechler at stat.math.ethz.ch maechler at stat.math.ethz.ch
Tue Apr 6 12:10:45 CEST 2004


>>>>> "daheiser" == daheiser  <daheiser at gvn.net>
>>>>>     on Tue,  6 Apr 2004 04:24:35 +0200 (CEST) writes:

    daheiser> It is an error in the algorithm. 

"it" being the behavior reported in bug report PR#1228 ---
 [too bad you didn't use the whole string "PR#1228" in your subject;
  if you had, no new report would have been created, and things
  would have gone to the proper place in the R-bugs repository ...]

namely the fact that
       var(rep(1e30, n))  
does not always (for all n) give 0.  With the following function

  tst.var <- function(x,nset= 2:100) {
     for(n in nset) {
	v <- var(rep(x,n))
	if(v != 0) cat(sprintf("%4d: %20.12g\n", n,v))
     }
  }

Actually, on my current desktop (AMD Athlon,Linux, R 1.9.0beta)
the computations seem to be more often exact than they used to:
Even  tst.var(1e30, 2:5000) doesn't produce any output nor do a few other 'x'
arguments I tried --- but the fundamental criticism is still
correct, and even on the Athlon, it's easy to find cases where
tst.var() *does* produce output.

    daheiser> It is an excellent example of how errors and
    daheiser> faults occur when the programmer follows the
    daheiser> mathematical formula exactly. Welford's algorithm
    daheiser> does not produce this error. It gives correct
    daheiser> standard deviation and variance values.

Actually, after reading

  @article{ChaTL97,
   author = {Tony F. Chan and John Gregg Lewis},
   title = {Computing standard deviations: accuracy},
   journal = {Commun. ACM},
   volume = 22,
   number = 9,
   year = 1979,
   issn = {0001-0782},
   pages = {526--531},
   doi = {http://doi.acm.org/10.1145/359146.359152},
   publisher = {ACM Press},
  }

  @article{WesD97,
   author = {D. H. D. West},
   title = {Updating mean and variance estimates: an improved method},
   journal = {Commun. ACM},
   volume = 22,
   number = 9,
   year = 1979,
   issn = {0001-0782},
   pages = {532--535},
   doi = {http://doi.acm.org/10.1145/359146.359153},
   publisher = {ACM Press},
  }

I'd conclude (from page 531) that Welford's algorithm is a bit
less accurate than the (very similar) "West" version,
and we (the R developers) should rather implement the latter.

Maybe for R 1.9.1 -- or even later  {there are even more
important numerical accuracy problems I know about!}

Martin Maechler <maechler at stat.math.ethz.ch>	http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16	Leonhardstr. 27
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1228			<><



More information about the R-devel mailing list