[R] Better way to create tables of mean & standard deviations
Chuck Cleland
ccleland at optonline.net
Tue Nov 7 13:59:48 CET 2006
Benjamin Dickgiesser wrote:
> Hi
>
> I'm trying to create tables of means, standard deviations and numbers
> of observations (i) for
> each laboratory (ii) for each batch number (iii) for each batch at
> each laboratory for the attached data.
>
> I created these functions:
> summary.aggregate <- function(y, label, ...)
> {
> temp.mean <- aggregate(y, FUN=mean, ...)
> temp.sd <- aggregate(y, FUN=sd, ...)
> temp.length <- aggregate(y, FUN=length, ...)
> txtlabs <-makeLabel(label,length(temp.mean$x))
>
> temp <-
> data.frame(mean=temp.mean$x,stdev=temp.sd$x,n=temp.length$x,row.names=txtlabs)
>
> }
> makeLabel <- function(label,llength,increaseLag=FALSE)
> {
> x <- c()
> for(cnt in 1:llength)
> {
> if(increaseLag == TRUE && mode(cnt/2))
> {
>
> }
> x[cnt] <- paste(label,cnt)
> }
> x
> }
>
> and can use the following commands to create tables of means etc.
>
> print(summary.aggregate(data.ceramic$Y,"Lab",by=list(data.ceramic$Lab)))
>
> to create output like this:
>
> mean stdev n
> Lab 1 645.6125 65.94129 60
> Lab 2 655.2121 70.64094 60
> Lab 3 633.3161 80.48620 60
> Lab 4 650.3897 77.59191 60
> Lab 5 630.4955 84.98888 60
> Lab 6 656.2608 66.16100 60
> Lab 7 666.1775 74.39796 60
> Lab 8 663.1543 71.10769 60
>
>
> The purpose of the first function is to calculate the mean, stdev etc.
> and the second is simply to create a labelling vector e.g c(Lab1,
> Lab2, ..., Lab 8)
>
>
>
> This seems rather complex to me for what I am trying to achieve. Is
> there a better way to do this?
> Also I am having some trouble getting the labelling right for iii
> since it should look like:
>
> Batch mean stdev n
> Lab 1 1 686.7179 53.37582 30
> Lab 1 2 695.8710 62.08583 30
> Lab 2 1 654.5317 94.19746 30
> Lab 2 2 702.9095 51.44984 30
> Lab 3 1 676.2975 69.13784 30
> Lab 3 2 692.1952 57.27212 30
> Lab 4 1 700.8995 56.91608 30
> Lab 4 2 702.5668 62.36488 30
> Lab 5 1 604.5070 50.01621 30
> Lab 5 2 614.5532 53.64149 30
> Lab 6 1 612.1006 58.09503 30
> Lab 6 2 597.8699 62.40710 30
> Lab 7 1 584.6934 74.66537 30
> Lab 7 2 620.3263 54.34871 30
> Lab 8 1 631.4555 74.34480 30
> Lab 8 2 623.7419 56.42492 30
>
> Currentley I'm using:
> temp <-
> summary.aggregate(data.ceramic$Y,"Lab",by=list(data.ceramic$Lab,data.ceramic$Batch))
>
> batchcnt <- c(1,2)
> print(data.frame(Batc=batchcnt,temp))
>
> But that produces this output:
> Batc mean stdev n
> Lab 1 1 686.7179 53.37582 30
> Lab 2 2 695.8710 62.08583 30
> Lab 3 1 654.5317 94.19746 30
> Lab 4 2 702.9095 51.44984 30
> Lab 5 1 676.2975 69.13784 30
> Lab 6 2 692.1952 57.27212 30
> Lab 7 1 700.8995 56.91608 30
> Lab 8 2 702.5668 62.36488 30
> Lab 9 1 604.5070 50.01621 30
> Lab 10 2 614.5532 53.64149 30
> Lab 11 1 612.1006 58.09503 30
> Lab 12 2 597.8699 62.40710 30
> Lab 13 1 584.6934 74.66537 30
> Lab 14 2 620.3263 54.34871 30
> Lab 15 1 631.4555 74.34480 30
> Lab 16 2 623.7419 56.42492 30
>
> I can only think of rather complex ways to solve the labeling issue...
>
> I would appreciate it if someone could point out if there are
> better/cleaner/easier ways of achieving what I'm trying todo.
Does this help?
g <- function(y) {
s <- apply(y, 2,
function(z) {
z <- z[!is.na(z)]
n <- length(z)
if(n==0) c(NA,NA,NA,0) else
if(n==1) c(z, NA,NA,1) else {
m <- mean(z)
s <- sd(z)
c(Mean=m, SD=s, N=n)
}
})
w <- as.vector(s)
names(w) <- as.vector(outer(rownames(s), colnames(s), paste, sep=''))
w
}
df <- data.frame(LAB = rep(1:8, each=60), BATCH = rep(c(1,2), 240), Y =
rnorm(480))
library(Hmisc)
with(df, summarize(cbind(Y),
llist(LAB, BATCH),
FUN = g,
stat.name=c("mean", "stdev", "n")))
LAB BATCH mean stdev n
1 1 1 0.13467569 1.0623188 30
2 1 2 0.15204232 1.0464287 30
3 2 1 -0.14470044 0.7881942 30
4 2 2 -0.34641739 0.9997924 30
5 3 1 -0.17915298 0.9720036 30
6 3 2 -0.13942702 0.8166447 30
7 4 1 0.08761900 0.9046908 30
8 4 2 0.27103640 0.7692970 30
9 5 1 0.08017377 1.1537611 30
10 5 2 0.01475674 1.0598336 30
11 6 1 0.29208572 0.8006171 30
12 6 2 0.10239509 1.1632274 30
13 7 1 -0.35550603 1.2016190 30
14 7 2 -0.33692452 1.0458184 30
15 8 1 -0.03779253 1.0385098 30
16 8 2 -0.18652758 1.1768540 30
with(df, summarize(cbind(Y),
llist(LAB),
FUN = g,
stat.name=c("mean", "stdev", "n")))
LAB mean stdev n
1 1 0.14335900 1.0454666 60
2 2 -0.24555892 0.8983465 60
3 3 -0.15929000 0.8902766 60
4 4 0.17932770 0.8377011 60
5 5 0.04746526 1.0988603 60
6 6 0.19724041 0.9946316 60
7 7 -0.34621527 1.1168682 60
8 8 -0.11216005 1.1029466 60
Once you write the summary function g, it's not that complex. See
?summarize in the Hmisc package for more detail. Also, you might take a
look at the doBy and reshape packages.
> Benjamin
>
>
> ------------------------------------------------------------------------
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894
More information about the R-help
mailing list