[Rd] Inconsistent handling of data frames in min(), max(), and mean()
Gavin Simpson
ucfagls at gmail.com
Thu Aug 21 20:32:31 CEST 2014
This inconsistency recently came to my attention:
> df <- data.frame(A = 1:10, B = rnorm(10))
> min(df)
[1] -1.768958
> max(df)
[1] 10
> mean(df)
[1] NA
Warning message:
In mean.default(df) : argument is not numeric or logical: returning NA
I recall the times where `mean(df)` would give `colMeans(df)` and this
behaviour was deemed inconsistent. It seems though that the change has
removed one inconsistency and replaced it with another.
Am I missing good reasons why there couldn't be a `mean.data.frame()`
method which worked like `max()` etc when given a data frame? Namely that
they return the required statistic *only* when presented with a data frame
of all numeric variables? E.g.
> df <- data.frame(A = 1:10, B = rnorm(10), C = factor(rep(c("A","B"), each
= 5)))
> max(df)
Error in FUN(X[[1L]], ...) :
only defined on a data frame with all numeric variables
I would expect `mean(df)` to fail with the same error as for `max(df)` with
the new example `df` but for would return the same as `mean(as.matrix(df))`
or `mean(colMeans(df))` if given an entirely numeric data frame:
> mean(colMeans(df[, 1:2]))
[1] 2.78366
> mean(as.matrix(df[, 1:2]))
[1] 2.78366
> mean(df[,1:2])
[1] 2.78366
I just can't see the sense in having `mean` work the way it does now?
Thanks,
Gavin
--
Gavin Simpson, PhD
[[alternative HTML version deleted]]
More information about the R-devel
mailing list