[Rd] Precision of function mean,bug?

Henrik Bengtsson henr|k@bengt@@on @end|ng |rom gm@||@com
Wed May 20 21:18:16 CEST 2020


On Wed, May 20, 2020 at 11:10 AM brodie gaslam via R-devel
<r-devel using r-project.org> wrote:
>
>  > On Wednesday, May 20, 2020, 7:00:09 AM EDT, peter dalgaard <pdalgd using gmail.com> wrote:
> >
> > Expected, see FAQ 7.31.
> >
> > You just can't trust == on FP operations. Notice also
>
> Additionally, since you're implementing a "mean" function you are testing
> against R's mean, you might want to consider that R uses a two-pass
> calculation[1] to reduce floating point precision error.

This one is important.

FWIW, matrixStats::mean2() provides argument refine=TRUE/FALSE to
calculate mean with and without this two-pass calculation;

> a <- c(x[idx],y[idx],z[idx]) / 3
> b <- mean(c(x[idx],y[idx],z[idx]))
> b == a
[1] FALSE
> b - a
[1] 2.220446e-16

> c <- matrixStats::mean2(c(x[idx],y[idx],z[idx]))  ## default to refine=TRUE
> b == c
[1] TRUE
> b - c
[1] 0

> d <- matrixStats::mean2(c(x[idx],y[idx],z[idx]), refine=FALSE)
> a == d
[1] TRUE
> a - d
[1] 0
> c == d
[1] FALSE
> c - d
[1] 2.220446e-16

Not surprisingly, the two-pass higher-precision version (refine=TRUE)
takes roughly twice as long as the one-pass quick version
(refine=FALSE).

/Henrik

>
> Best,
>
> Brodie.
>
> [1] https://github.com/wch/r-source/blob/tags/R-4-0-0/src/main/summary.c#L482
>
> > > a2=(z[idx]+x[idx]+y[idx])/3
> > > a2==a
> > [1] FALSE
> > > a2==b
> > [1] TRUE
> >
> > -pd
> >
> > > On 20 May 2020, at 12:40 , Morgan Morgan <morgan.emailbox using gmail.com> wrote:
> > >
> > > Hello R-dev,
> > >
> > > Yesterday, while I was testing the newly implemented function pmean in
> > > package kit, I noticed a mismatch in the output of the below R expressions.
> > >
> > > set.seed(123)
> > > n=1e3L
> > > idx=5
> > > x=rnorm(n)
> > > y=rnorm(n)
> > > z=rnorm(n)
> > > a=(x[idx]+y[idx]+z[idx])/3
> > > b=mean(c(x[idx],y[idx],z[idx]))
> > > a==b
> > > # [1] FALSE
> > >
> > > For idx= 1, 2, 3, 4 the last line is equal to TRUE. For 5, 6 and many
> > > others the difference is small but still.
> > > Is that expected or is it a bug?
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list