[R] Better way to create tables of mean & standard deviations
hadley wickham
h.wickham at gmail.com
Tue Nov 7 14:15:48 CET 2006
> > I can only think of rather complex ways to solve the labeling issue...
> >
> > I would appreciate it if someone could point out if there are
> > better/cleaner/easier ways of achieving what I'm trying todo.
>
> Does this help?
>
> g <- function(y) {
> s <- apply(y, 2,
> function(z) {
> z <- z[!is.na(z)]
> n <- length(z)
> if(n==0) c(NA,NA,NA,0) else
> if(n==1) c(z, NA,NA,1) else {
> m <- mean(z)
> s <- sd(z)
> c(Mean=m, SD=s, N=n)
> }
> })
> w <- as.vector(s)
> names(w) <- as.vector(outer(rownames(s), colnames(s), paste, sep=''))
> w
> }
>
> df <- data.frame(LAB = rep(1:8, each=60), BATCH = rep(c(1,2), 240), Y =
> rnorm(480))
>
> library(Hmisc)
>
> with(df, summarize(cbind(Y),
> llist(LAB, BATCH),
> FUN = g,
> stat.name=c("mean", "stdev", "n")))
>
> LAB BATCH mean stdev n
> 1 1 1 0.13467569 1.0623188 30
> 2 1 2 0.15204232 1.0464287 30
> 3 2 1 -0.14470044 0.7881942 30
> 4 2 2 -0.34641739 0.9997924 30
> 5 3 1 -0.17915298 0.9720036 30
> 6 3 2 -0.13942702 0.8166447 30
> 7 4 1 0.08761900 0.9046908 30
> 8 4 2 0.27103640 0.7692970 30
> 9 5 1 0.08017377 1.1537611 30
> 10 5 2 0.01475674 1.0598336 30
> 11 6 1 0.29208572 0.8006171 30
> 12 6 2 0.10239509 1.1632274 30
> 13 7 1 -0.35550603 1.2016190 30
> 14 7 2 -0.33692452 1.0458184 30
> 15 8 1 -0.03779253 1.0385098 30
> 16 8 2 -0.18652758 1.1768540 30
>
> with(df, summarize(cbind(Y),
> llist(LAB),
> FUN = g,
> stat.name=c("mean", "stdev", "n")))
>
> LAB mean stdev n
> 1 1 0.14335900 1.0454666 60
> 2 2 -0.24555892 0.8983465 60
> 3 3 -0.15929000 0.8902766 60
> 4 4 0.17932770 0.8377011 60
> 5 5 0.04746526 1.0988603 60
> 6 6 0.19724041 0.9946316 60
> 7 7 -0.34621527 1.1168682 60
> 8 8 -0.11216005 1.1029466 60
>
> Once you write the summary function g, it's not that complex. See
> ?summarize in the Hmisc package for more detail. Also, you might take a
> look at the doBy and reshape packages.
With the reshape package, I'd do it like this:
df <- data.frame(LAB = rep(1:8, each=60), BATCH = rep(c(1,2), 240), Y
=rnorm(480))
dfm <- melt(df, measured="Y")
cast(dfm, LAB ~ ., c(mean, sd, length))
cast(dfm, LAB + BATCH ~ ., c(mean, sd, length))
cast(dfm, LAB + BATCH ~ ., c(mean, sd, length), margins=T)
Regards,
Hadley
More information about the R-help
mailing list