[R] Better way to create tables of mean & standard deviations

Tue Nov 7 13:59:48 CET 2006

Benjamin Dickgiesser wrote:
> Hi
> 
> I'm trying to create tables of means, standard deviations and numbers
> of observations (i) for
> each laboratory (ii) for each batch number (iii) for each batch at
> each laboratory for the attached data.
> 
> I created these functions:
> summary.aggregate <- function(y, label, ...)
> {
>     temp.mean     <- aggregate(y, FUN=mean, ...)
>     temp.sd          <- aggregate(y, FUN=sd, ...)
>     temp.length <- aggregate(y, FUN=length, ...)
>     txtlabs <-makeLabel(label,length(temp.mean$x))
>     
>     temp <-
> data.frame(mean=temp.mean$x,stdev=temp.sd$x,n=temp.length$x,row.names=txtlabs)
> 
> }
> makeLabel <- function(label,llength,increaseLag=FALSE)
> {
>     x <- c()
>     for(cnt in 1:llength)
>     {
>     if(increaseLag == TRUE && mode(cnt/2))
>     {
>        
>     }
>     x[cnt] <- paste(label,cnt)
>     }
>     x
> }
> 
> and can use the following commands to create tables of means etc.
> 
> print(summary.aggregate(data.ceramic$Y,"Lab",by=list(data.ceramic$Lab)))
> 
> to create output like this:
> 
>          mean    stdev  n
> Lab 1 645.6125 65.94129 60
> Lab 2 655.2121 70.64094 60
> Lab 3 633.3161 80.48620 60
> Lab 4 650.3897 77.59191 60
> Lab 5 630.4955 84.98888 60
> Lab 6 656.2608 66.16100 60
> Lab 7 666.1775 74.39796 60
> Lab 8 663.1543 71.10769 60
> 
> 
> The purpose of the first function is to calculate the mean, stdev etc.
> and the second is simply to create a labelling vector e.g c(Lab1,
> Lab2, ..., Lab 8)
> 
> 
> 
> This seems rather complex to me for what I am trying to achieve. Is
> there a better way to do this?
> Also I am having some trouble getting the labelling right for iii
> since it should look like:
> 
>       Batch    mean    stdev  n
> Lab 1     1 686.7179 53.37582 30
> Lab 1     2 695.8710 62.08583 30
> Lab 2     1 654.5317 94.19746 30
> Lab 2     2 702.9095 51.44984 30
> Lab 3     1 676.2975 69.13784 30
> Lab 3     2 692.1952 57.27212 30
> Lab 4     1 700.8995 56.91608 30
> Lab 4     2 702.5668 62.36488 30
> Lab 5     1 604.5070 50.01621 30
> Lab 5    2 614.5532 53.64149 30
> Lab 6    1 612.1006 58.09503 30
> Lab 6   2 597.8699 62.40710 30
> Lab 7    1 584.6934 74.66537 30
> Lab 7    2 620.3263 54.34871 30
> Lab 8    1 631.4555 74.34480 30
> Lab 8   2 623.7419 56.42492 30
> 
> Currentley I'm using:
> temp <-
> summary.aggregate(data.ceramic$Y,"Lab",by=list(data.ceramic$Lab,data.ceramic$Batch))
> 
> batchcnt <- c(1,2)
> print(data.frame(Batc=batchcnt,temp))
> 
> But that produces this output:
>       Batc     mean    stdev  n
> Lab 1     1 686.7179 53.37582 30
> Lab 2     2 695.8710 62.08583 30
> Lab 3     1 654.5317 94.19746 30
> Lab 4     2 702.9095 51.44984 30
> Lab 5     1 676.2975 69.13784 30
> Lab 6     2 692.1952 57.27212 30
> Lab 7     1 700.8995 56.91608 30
> Lab 8     2 702.5668 62.36488 30
> Lab 9     1 604.5070 50.01621 30
> Lab 10    2 614.5532 53.64149 30
> Lab 11    1 612.1006 58.09503 30
> Lab 12    2 597.8699 62.40710 30
> Lab 13    1 584.6934 74.66537 30
> Lab 14    2 620.3263 54.34871 30
> Lab 15    1 631.4555 74.34480 30
> Lab 16    2 623.7419 56.42492 30
> 
> I can only think of  rather complex ways to solve the labeling issue...
> 
> I would appreciate it if someone could point out if there are
> better/cleaner/easier ways of achieving what I'm trying todo.

  Does this help?

g <- function(y) {
  s <- apply(y, 2,
             function(z) {
               z <- z[!is.na(z)]
               n <- length(z)
               if(n==0) c(NA,NA,NA,0) else
               if(n==1) c(z, NA,NA,1) else {
                 m <- mean(z)
                 s <- sd(z)
                 c(Mean=m, SD=s, N=n)
               }
             })
  w <- as.vector(s)
  names(w) <-  as.vector(outer(rownames(s), colnames(s), paste, sep=''))
  w
}

df <- data.frame(LAB = rep(1:8, each=60), BATCH = rep(c(1,2), 240), Y =
rnorm(480))

library(Hmisc)

with(df, summarize(cbind(Y),
                   llist(LAB, BATCH),
                   FUN = g,
                   stat.name=c("mean", "stdev", "n")))

   LAB BATCH        mean     stdev  n
1    1     1  0.13467569 1.0623188 30
2    1     2  0.15204232 1.0464287 30
3    2     1 -0.14470044 0.7881942 30
4    2     2 -0.34641739 0.9997924 30
5    3     1 -0.17915298 0.9720036 30
6    3     2 -0.13942702 0.8166447 30
7    4     1  0.08761900 0.9046908 30
8    4     2  0.27103640 0.7692970 30
9    5     1  0.08017377 1.1537611 30
10   5     2  0.01475674 1.0598336 30
11   6     1  0.29208572 0.8006171 30
12   6     2  0.10239509 1.1632274 30
13   7     1 -0.35550603 1.2016190 30
14   7     2 -0.33692452 1.0458184 30
15   8     1 -0.03779253 1.0385098 30
16   8     2 -0.18652758 1.1768540 30

with(df, summarize(cbind(Y),
                   llist(LAB),
                   FUN = g,
                   stat.name=c("mean", "stdev", "n")))

  LAB        mean     stdev  n
1   1  0.14335900 1.0454666 60
2   2 -0.24555892 0.8983465 60
3   3 -0.15929000 0.8902766 60
4   4  0.17932770 0.8377011 60
5   5  0.04746526 1.0988603 60
6   6  0.19724041 0.9946316 60
7   7 -0.34621527 1.1168682 60
8   8 -0.11216005 1.1029466 60

  Once you write the summary function g, it's not that complex.  See
?summarize in the Hmisc package for more detail.  Also, you might take a
look at the doBy and reshape packages.

> Benjamin
> 
> 
> ------------------------------------------------------------------------
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894