[R] different interface to by (tapply)?
Gabor Grothendieck
ggrothendieck at gmail.com
Mon Aug 30 23:23:49 CEST 2010
On Mon, Aug 30, 2010 at 3:54 PM, Dennis Murphy <djmuser at gmail.com> wrote:
> Hi:
>
> You've already gotten some good replies re aggregate() and plyr; here are
> two more choices, from packages doBy and data.table, plus the others for
> a contained summary:
>
> key <- c(1,1,1,2,2,2)
> val1 <- rnorm(6)
> indf <- data.frame( key, val1)
> outdf <- by(indf, indf$key, function(x) c(m=mean(x), s=sd(x)) )
> outdf
>
> # Alternatives:
>
> # aggregate (base) with new formula interface
>
> # write a small function to return multiple outputs
> f <- function(x) c(mean = mean(x, na.rm = TRUE), sd = sd(x, na.rm = TRUE))
>
> aggregate(val1 ~ key, data = indf, FUN = f)
> key val1.mean val1.sd
> 1 1 -0.9783589 0.6378922
> 2 2 0.2816016 1.4490699
>
> # package doBy (get the same output)
>
> library(doBy)
> summaryBy(val1 ~ key, data = indf, FUN = f)
> key val1.mean val1.sd
> 1 1 -0.9783589 0.6378922
> 2 2 0.2816016 1.4490699
>
> # package plyr
>
> library(plyr)
> ddply(indf, .(key), summarise, mean = mean(val1), sd = sd(val1))
> key mean sd
> 1 1 -0.9783589 0.6378922
> 2 2 0.2816016 1.4490699
>
> # package data.table
>
> library(data.table)
> indt <- data.table(indf)
> indt[, list(mean = mean(val1), sd = sd(val1)), by = list(as.integer(key))]
> key mean sd
> [1,] 1 -0.9783589 0.6378922
> [2,] 2 0.2816016 1.4490699
>
> It's a cornucopia! :) Multiple grouping variables are no problem with these
> functions, BTW.
>
> HTH,
And here are yet four more:
>
> f.by <- function(x) c(key = x$key[1], mean = mean(x$val), sd = sd(x$val))
> do.call(rbind, by(indf, indf["key"], f.by))
key mean sd
1 1 0.006794852 0.3779713
2 2 0.251890650 0.4379315
>
> library(sqldf)
> sqldf("select key, avg(val1) mean, stdev(val1) sd from indf group by key")
key mean sd
1 1 0.006794852 0.3779713
2 2 0.251890650 0.4379315
>
> library(remix)
> remix(val1 ~ key, transform(indf, key = factor(key)), funs = c(mean, sd))
val1 ~ key
==========
+-----+---+------+-------+------+
| | mean | sd |
+=====+===+======+=======+======+
| key | 1 | val1 | 0.01 | 0.38 |
+ +---+------+-------+------+
| | 2 | val1 | 0.25 | 0.44 |
+-----+---+------+-------+------+
>
> library(Hmisc)
> summary(val1 ~ key, indf, fun = function(x) c(mean = mean(x), sd = sd(x)))
val1 N=6
+-------+-+-+-----------+---------+
| | |N|mean |sd.val1 |
+-------+-+-+-----------+---------+
|key |1|3|0.006794852|0.3779713|
| |2|3|0.251890650|0.4379315|
+-------+-+-+-----------+---------+
|Overall| |6|0.129342751|0.3897180|
+-------+-+-+-----------+---------+
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
More information about the R-help
mailing list