[Rd] Precision of function mean,bug?
Morgan Morgan
morg@n@em@||box @end|ng |rom gm@||@com
Thu May 21 18:34:35 CEST 2020
Sorry, posting back to the list.
Thank you all.
Morgan
On Thu, 21 May 2020, 16:33 Henrik Bengtsson, <henrik.bengtsson using gmail.com>
wrote:
> Hi.
>
> Good point and a good example. Feel free to post to the list. The purpose
> of my reply wasn't to take away Peter's point but to emphasize that
> base::mean() does a two-pass scan over the elements too lower the impact of
> addition of values with widely different values (classical problem in
> numerical analysis). But I can see how it may look like that.
>
> Cheers,
>
> Henrik
>
>
> On Thu, May 21, 2020, 03:21 Morgan Morgan <morgan.emailbox using gmail.com>
> wrote:
>
>> Thank you Henrik for the feedback.
>> Note that for idx=4 and refine = TRUE, your equality b==c is FALSE. I
>> think that as Peter said == can't be trusted with FP.
>> His example is good. Here is an even more shocking one.
>> a=0.786546798
>> b=a+ 1e6 -1e6
>> a==b
>> # [1] FALSE
>>
>> Best regards
>> Morgan Jacob
>>
>> On Wed, 20 May 2020, 20:18 Henrik Bengtsson, <henrik.bengtsson using gmail.com>
>> wrote:
>>
>>> On Wed, May 20, 2020 at 11:10 AM brodie gaslam via R-devel
>>> <r-devel using r-project.org> wrote:
>>> >
>>> > > On Wednesday, May 20, 2020, 7:00:09 AM EDT, peter dalgaard <
>>> pdalgd using gmail.com> wrote:
>>> > >
>>> > > Expected, see FAQ 7.31.
>>> > >
>>> > > You just can't trust == on FP operations. Notice also
>>> >
>>> > Additionally, since you're implementing a "mean" function you are
>>> testing
>>> > against R's mean, you might want to consider that R uses a two-pass
>>> > calculation[1] to reduce floating point precision error.
>>>
>>> This one is important.
>>>
>>> FWIW, matrixStats::mean2() provides argument refine=TRUE/FALSE to
>>> calculate mean with and without this two-pass calculation;
>>>
>>> > a <- c(x[idx],y[idx],z[idx]) / 3
>>> > b <- mean(c(x[idx],y[idx],z[idx]))
>>> > b == a
>>> [1] FALSE
>>> > b - a
>>> [1] 2.220446e-16
>>>
>>> > c <- matrixStats::mean2(c(x[idx],y[idx],z[idx])) ## default to
>>> refine=TRUE
>>> > b == c
>>> [1] TRUE
>>> > b - c
>>> [1] 0
>>>
>>> > d <- matrixStats::mean2(c(x[idx],y[idx],z[idx]), refine=FALSE)
>>> > a == d
>>> [1] TRUE
>>> > a - d
>>> [1] 0
>>> > c == d
>>> [1] FALSE
>>> > c - d
>>> [1] 2.220446e-16
>>>
>>> Not surprisingly, the two-pass higher-precision version (refine=TRUE)
>>> takes roughly twice as long as the one-pass quick version
>>> (refine=FALSE).
>>>
>>> /Henrik
>>>
>>> >
>>> > Best,
>>> >
>>> > Brodie.
>>> >
>>> > [1]
>>> https://github.com/wch/r-source/blob/tags/R-4-0-0/src/main/summary.c#L482
>>> >
>>> > > > a2=(z[idx]+x[idx]+y[idx])/3
>>> > > > a2==a
>>> > > [1] FALSE
>>> > > > a2==b
>>> > > [1] TRUE
>>> > >
>>> > > -pd
>>> > >
>>> > > > On 20 May 2020, at 12:40 , Morgan Morgan <
>>> morgan.emailbox using gmail.com> wrote:
>>> > > >
>>> > > > Hello R-dev,
>>> > > >
>>> > > > Yesterday, while I was testing the newly implemented function
>>> pmean in
>>> > > > package kit, I noticed a mismatch in the output of the below R
>>> expressions.
>>> > > >
>>> > > > set.seed(123)
>>> > > > n=1e3L
>>> > > > idx=5
>>> > > > x=rnorm(n)
>>> > > > y=rnorm(n)
>>> > > > z=rnorm(n)
>>> > > > a=(x[idx]+y[idx]+z[idx])/3
>>> > > > b=mean(c(x[idx],y[idx],z[idx]))
>>> > > > a==b
>>> > > > # [1] FALSE
>>> > > >
>>> > > > For idx= 1, 2, 3, 4 the last line is equal to TRUE. For 5, 6 and
>>> many
>>> > > > others the difference is small but still.
>>> > > > Is that expected or is it a bug?
>>> >
>>> > ______________________________________________
>>> > R-devel using r-project.org mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
[[alternative HTML version deleted]]
More information about the R-devel
mailing list