[R] different interface to by (tapply)?

Mon Aug 30 23:23:49 CEST 2010

On Mon, Aug 30, 2010 at 3:54 PM, Dennis Murphy <djmuser at gmail.com> wrote:
> Hi:
>
> You've already gotten some good replies re aggregate() and plyr; here are
> two more choices, from packages doBy and data.table, plus the others for
> a contained summary:
>
>  key <- c(1,1,1,2,2,2)
>  val1 <- rnorm(6)
>  indf <- data.frame( key, val1)
>  outdf <- by(indf, indf$key, function(x) c(m=mean(x), s=sd(x)) )
>  outdf
>
> # Alternatives:
>
> # aggregate (base) with new formula interface
>
> # write a small function to return multiple outputs
> f <- function(x) c(mean = mean(x, na.rm = TRUE), sd = sd(x, na.rm = TRUE))
>
> aggregate(val1 ~ key, data = indf, FUN = f)
>  key  val1.mean    val1.sd
> 1   1 -0.9783589  0.6378922
> 2   2  0.2816016  1.4490699
>
> # package doBy   (get the same output)
>
> library(doBy)
> summaryBy(val1 ~ key, data = indf, FUN = f)
>  key  val1.mean   val1.sd
> 1   1 -0.9783589 0.6378922
> 2   2  0.2816016 1.4490699
>
> # package plyr
>
> library(plyr)
> ddply(indf, .(key), summarise, mean = mean(val1), sd = sd(val1))
>  key       mean        sd
> 1   1 -0.9783589 0.6378922
> 2   2  0.2816016 1.4490699
>
> # package data.table
>
> library(data.table)
> indt <- data.table(indf)
> indt[, list(mean = mean(val1), sd = sd(val1)), by = list(as.integer(key))]
>     key       mean        sd
> [1,]   1 -0.9783589 0.6378922
> [2,]   2  0.2816016 1.4490699
>
> It's a cornucopia! :) Multiple grouping variables are no problem with these
> functions, BTW.
>
> HTH,

And here are yet four more:

>
> f.by <- function(x) c(key = x$key[1], mean = mean(x$val), sd = sd(x$val))
> do.call(rbind, by(indf, indf["key"], f.by))
  key        mean        sd
1   1 0.006794852 0.3779713
2   2 0.251890650 0.4379315
>
> library(sqldf)
> sqldf("select key, avg(val1) mean, stdev(val1) sd from indf group by key")
  key        mean        sd
1   1 0.006794852 0.3779713
2   2 0.251890650 0.4379315
>
> library(remix)
> remix(val1 ~ key, transform(indf, key = factor(key)), funs = c(mean, sd))
val1 ~ key
==========

+-----+---+------+-------+------+
|                | mean  | sd   |
+=====+===+======+=======+======+
| key | 1 | val1 | 0.01  | 0.38 |
+     +---+------+-------+------+
|     | 2 | val1 | 0.25  | 0.44 |
+-----+---+------+-------+------+
>
> library(Hmisc)
> summary(val1 ~ key, indf, fun = function(x) c(mean = mean(x), sd = sd(x)))
val1    N=6

+-------+-+-+-----------+---------+
|       | |N|mean       |sd.val1  |
+-------+-+-+-----------+---------+
|key    |1|3|0.006794852|0.3779713|
|       |2|3|0.251890650|0.4379315|
+-------+-+-+-----------+---------+
|Overall| |6|0.129342751|0.3897180|
+-------+-+-+-----------+---------+

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com