[Rd] R-devel Digest, Vol 181, Issue 22

Randall Pruim rpruim at calvin.edu
Sun Mar 25 15:56:22 CEST 2018


Thanks.

I am fully aware of what aggregate() returnes, and I can post-process this into the form I want — if the names are available.

But for foo, the returned object is both different in structure and loses the name altogether:

foo <- function(x) { c(mean = base::mean(x)) }
str(aggregate(iris$Sepal.Length, by = list(iris$Species), FUN = foo))
## 'data.frame': 3 obs. of  2 variables:
##  $ Group.1: Factor w/ 3 levels "setosa","versicolor",..: 1 2 3
##  $ x      : num  5.01 5.94 6.59

If $x were $mean, or if $x were a matrix with one column named “mean”, then I would not have had to write a custom aggregate.

For anyone interested, I’m doing this to create a function that can compute (multiple, group-wise) summary statistics easily, and return them in a tidy data frame with one row for each group.  Naming depends on whether … arguments are named and on some optional arguments.  Here is an example.

args(df_stats)
## function (formula, data, ..., drop = TRUE, fargs = list(), sep = "_",
##     format = c("wide", "long"), groups = NULL, long_names = TRUE,
##     nice_names = FALSE, na.action = "na.warn")

df_stats(Sepal.Length ~ Species, data = iris, mean, sd, R = range, Q = quantile)
##      Species mean_Sepal.Length sd_Sepal.Length R_1 R_2 Q_0% Q_25% Q_50% Q_75% Q_100%
## 1     setosa             5.006       0.3524897 4.3 5.8  4.3 4.800   5.0   5.2    5.8
## 2 versicolor             5.936       0.5161711 4.9 7.0  4.9 5.600   5.9   6.3    7.0
## 3  virginica             6.588       0.6358796 4.9 7.9  4.9 6.225   6.5   6.9    7.9

As I’ve said, I solved my problem by creating a slightly modified version of aggregate().  But it made me wonder whether this is a bug or a feature in aggregate().

—rjp


On Mar 25, 2018, at 6:00 AM, r-devel-request at r-project.org<mailto:r-devel-request at r-project.org> wrote:

Date: Sat, 24 Mar 2018 20:08:33 +0000 (UTC)
From: lmo <lukemolson at yahoo.com<mailto:lukemolson at yahoo.com>>
To: "R-devel at r-project.org<mailto:R-devel at r-project.org>" <R-devel at r-project.org<mailto:R-devel at r-project.org>>
Subject: [Rd] aggregate() naming -- bug or feature
Message-ID: <1099191693.397761.1521922113450 at mail.yahoo.com<mailto:1099191693.397761.1521922113450 at mail.yahoo.com>>
Content-Type: text/plain; charset="utf-8"

Be aware that the object that aggregate returns with bar() is more complicated than you think.
str(aggregate(iris$Sepal.Length, by = list(iris$Species), FUN = bar))
'data.frame':    3 obs. of  2 variables:
 $ Group.1: Factor w/ 3 levels "setosa","versicolor",..: 1 2 3
 $ x      : num [1:3, 1:2] 5.006 5.936 6.588 0.352 0.516 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr  "mean" "sd"
So you get a two column data.frame whose second column is a matrix.


	[[alternative HTML version deleted]]



More information about the R-devel mailing list