[R] Nicely formatted summary table with mean, standard deviation or number and proportion

Keith Wong keithw at med.usyd.edu.au
Mon May 14 03:56:45 CEST 2007


Dear all,

The incredibly useful Hmisc package provides a method to generate 
summary tables that can be typeset in latex. The Alzola and Harrell book 
  "An introduction to S and the Hmisc and Design libraries" provides an 
example that generates mean and quartiles for continuous variables, and 
numbers and percentages for count variables: summary() with method = 
'reverse'.

I wonder if there is a way to change it so the mean and standard 
deviation are reported instead for continuous variables.

I illustrate my question below using an example from the book.

Thank you.

Keith


 > ####
 > library(Hmisc)
 >
 > set.seed(173)
 > sex = factor(sample(c("m", "f"), 500, rep = T))
 > age = rnorm(500, 50, 5)
 > treatment = factor(sample(c("Drug", "Placebo"), 500, rep = T))
 > summary(sex ~ treatment, fun = table)
sex    N=500

+---------+-------+---+---+---+
|         |       |N  |f  |m  |
+---------+-------+---+---+---+
|treatment|Drug   |263|140|123|
|         |Placebo|237|133|104|
+---------+-------+---+---+---+
|Overall  |       |500|273|227|
+---------+-------+---+---+---+
 >
 >
 >
 > (x = summary(treatment ~ age + sex, method = "reverse"))
 > # generates quartiles for continuous variables


Descriptive Statistics by treatment

+-------+--------------+--------------+
|       |Drug          |Placebo       |
|       |(N=263)       |(N=237)       |
+-------+--------------+--------------+
|age    |46.5/49.9/53.2|46.7/50.0/53.4|
+-------+--------------+--------------+
|sex : m|   47% (123)  |   44% (104)  |
+-------+--------------+--------------+
 >
 >
 > # latex(x) generates a very nicely formatted table
 > # but I'd like "mean (standard deviation)" instead of quartiles.



 > # this function from 
http://tolstoy.newcastle.edu.au/R/e2/help/06/11/4713.html
 > g <- function(y) {
+   s <- apply(y, 2,
+              function(z) {
+                z <- z[!is.na(z)]
+                n <- length(z)
+                if(n==0) c(NA,NA,NA,0) else
+                if(n==1) c(z, NA,NA,1) else {
+                  m <- mean(z)
+                  s <- sd(z)
+                  c(N=n, Mean=m, SD=s)
+                }
+              })
+   w <- as.vector(s)
+   names(w) <-  as.vector(outer(rownames(s), colnames(s), paste, sep=''))
+   w
+ }

 >
 > summary(treatment ~ age + sex, method = "reverse", fun = g)
 > # does not work, 'fun' or 'FUN" argument is ignored.


Descriptive Statistics by treatment

+-------+--------------+--------------+
|       |Drug          |Placebo       |
|       |(N=263)       |(N=237)       |
+-------+--------------+--------------+
|age    |46.5/49.9/53.2|46.7/50.0/53.4|
+-------+--------------+--------------+
|sex : m|   47% (123)  |   44% (104)  |
+-------+--------------+--------------+
 >
 >
 > (x1 = summarize(cbind(age), llist(treatment), FUN = g, 
stat.name=c("n", "mean", "sd")))
   treatment   n mean   sd
1      Drug 263 49.9 4.94
2   Placebo 237 50.1 4.97
 >
 > # this works but table is rotated, and it count data has to be
 > # treated separately.



-- 
Keith Wong
PhD candidate
Sleep & Circadian Research Group
Woolcock Institute of Medical Research

email   keithw at med.usyd.edu.au
Phone   +61 2 9515 8981
Fax     +61 2 9515 7070
Mail    PO Box M77, Missenden Road NSW 2050, Australia



More information about the R-help mailing list