[R] result of mean(v1, v2, v3) of three real number not the same as sum(v1, v2, v3)/3
Avi Gross
@v|gro@@ @end|ng |rom ver|zon@net
Fri May 13 00:59:47 CEST 2022
I think, in addition to earlier replies, that there are effects in R based on just about everything being a vector. The number 1 is a vector of length 1 and is not really different than a longer one like c(1,2,3) except for having a length of 1 versus 3.
So sum(1:3, 3:1, 1) adds to 13 as it is seen as sum(1, 2, 3, 3, 3, 1, 1) in a sense.
Your example did not trip up an error but this does: mean(1:3, 7:9)
Error in mean.default(1:3, 7:9) : 'trim' must be numeric of length one
There is the hint because the optional trim= argument can only work with a single scalar. I gave it a vector of c(7,8,9) and it barfed. Other R functionality often just supplies a warning when it gets a vector when it expected a scalar albeit they look the same otherwise. I mean if(... vector) may now start considering it an error.
How would you rewrite mean to catch odd cases? I can understand it assuming a second argument to be for trim and a third argument for na.rm but it does not check for it to be "TRUE" and accepts non-zero numbers as a version of na.rm=TRUE.
But anything with 4 or more arguments ought to be visibly WRONG but is apparently blindly accepted as mean(1,2,3,4,5,6,7,8,9) returns the wrong answer of 1 with no complaint.
I am a fan of not using keyword optional arguments without a keyword, even if it is legal.
If your function insisted on being called like:
mean(num, trim=num, na.rm=TRUE)
then it could easily catch errors like the one we are discussing.
I note median() is also similarly flawed now. It only has one documented argument ad still does not catch getting 3 or more arguments. sd() fails on sd(3,3) with an NA but only because sd(3) is equally NA. But for more arguments like sd(3, 4, 5), it fails because of unused arguments.
So there are some inconsistencies that catch some errors and not others but a complete "fix" may not work as long as R allows vectors of length 1 in all contexts.
Making optional arguments required to be spelled out would much existing code too.
For now, just using c() in solutions work well.
-----Original Message-----
From: Henrik Bengtsson <henrik.bengtsson using gmail.com>
To: Ivan Krylov <krylov.r00t using gmail.com>
Cc: r-help using r-project.org <r-help using r-project.org>
Sent: Thu, May 12, 2022 4:39 pm
Subject: Re: [R] result of mean(v1, v2, v3) of three real number not the same as sum(v1, v2, v3)/3
There's actually another reason why mean(x) and sum(x)/length(x) may
differ, e.g.
x <- c(rnorm(1e6, sd=.Machine$double.eps), rnorm(1e6, sd=1))
mean(x) - sum(x)/length(x)
#> [1] 1.011781e-18
The mean() function calculates the sample mean using a two pass scan
through the data. The first scan calculates the total sum and divides
by the number of (non-missing) values. In the second scan, this
average is refined by adding the residuals towards the first average.
This way numerical precision of mean(x) is higher than
sum(x)/length(x) when there spread of 'x' is large. It also means
that the processing time of mean(x) is roughly twice that of
sum(x)/length(x).
/Henrik
On Thu, May 12, 2022 at 1:22 PM Ivan Krylov <krylov.r00t using gmail.com> wrote:
>
> Eric Berger and Marc Schwartz and David K Stevens probably said it
> better. I was trying to illustrate the way mean() takes its arguments
> using the match.call function.
>
> The sum() function can take individual numbers or vectors and
> sum all their elements, so sum(c(1, 2, 3)) is the same as sum(1, 2, 3),
> or even sum(c(1, 2), 3): they all do what you mean them to do.
>
> The mean() function is different. It may accept many arguments, but
> only the first of them is the vector of numbers you're interested in:
> mean(c(1, 2, 3)) is the correct way to call it. Unfortunately, when you
> give it more arguments and they aren't what mean() expects them to be
> (the second one should be a number in [0; 0.5] and the third one should
> be TRUE or FALSE, see help(mean) if you're curious), R doesn't warn you
> or raise an error condition.
>
> My use of match.call() was supposed to show that by calling mean(a, b,
> c), I pass the number "b" as the "trim" argument to mean() and the
> number "c" as the "na.rm" argument to mean(), which is not what was
> intended here.
>
> --
> Best regards,
> Ivan
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
More information about the R-help
mailing list