[R] Nicely formatted summary table with mean, standard deviation or number and proportion

Frank E Harrell Jr f.harrell at vanderbilt.edu
Mon May 14 04:11:51 CEST 2007


Keith Wong wrote:
> Dear all,
> 
> The incredibly useful Hmisc package provides a method to generate 
> summary tables that can be typeset in latex. The Alzola and Harrell book 
>   "An introduction to S and the Hmisc and Design libraries" provides an 
> example that generates mean and quartiles for continuous variables, and 
> numbers and percentages for count variables: summary() with method = 
> 'reverse'.
> 
> I wonder if there is a way to change it so the mean and standard 
> deviation are reported instead for continuous variables.
> 
> I illustrate my question below using an example from the book.
> 
> Thank you.
> 
> Keith

Newer versions of Hmisc have an option to add mean and SD for 
method='reverse'.  Quartiles are always there.

Frank

> 
> 
>  > ####
>  > library(Hmisc)
>  >
>  > set.seed(173)
>  > sex = factor(sample(c("m", "f"), 500, rep = T))
>  > age = rnorm(500, 50, 5)
>  > treatment = factor(sample(c("Drug", "Placebo"), 500, rep = T))
>  > summary(sex ~ treatment, fun = table)
> sex    N=500
> 
> +---------+-------+---+---+---+
> |         |       |N  |f  |m  |
> +---------+-------+---+---+---+
> |treatment|Drug   |263|140|123|
> |         |Placebo|237|133|104|
> +---------+-------+---+---+---+
> |Overall  |       |500|273|227|
> +---------+-------+---+---+---+
>  >
>  >
>  >
>  > (x = summary(treatment ~ age + sex, method = "reverse"))
>  > # generates quartiles for continuous variables
> 
> 
> Descriptive Statistics by treatment
> 
> +-------+--------------+--------------+
> |       |Drug          |Placebo       |
> |       |(N=263)       |(N=237)       |
> +-------+--------------+--------------+
> |age    |46.5/49.9/53.2|46.7/50.0/53.4|
> +-------+--------------+--------------+
> |sex : m|   47% (123)  |   44% (104)  |
> +-------+--------------+--------------+
>  >
>  >
>  > # latex(x) generates a very nicely formatted table
>  > # but I'd like "mean (standard deviation)" instead of quartiles.
> 
> 
> 
>  > # this function from 
> http://tolstoy.newcastle.edu.au/R/e2/help/06/11/4713.html
>  > g <- function(y) {
> +   s <- apply(y, 2,
> +              function(z) {
> +                z <- z[!is.na(z)]
> +                n <- length(z)
> +                if(n==0) c(NA,NA,NA,0) else
> +                if(n==1) c(z, NA,NA,1) else {
> +                  m <- mean(z)
> +                  s <- sd(z)
> +                  c(N=n, Mean=m, SD=s)
> +                }
> +              })
> +   w <- as.vector(s)
> +   names(w) <-  as.vector(outer(rownames(s), colnames(s), paste, sep=''))
> +   w
> + }
> 
>  >
>  > summary(treatment ~ age + sex, method = "reverse", fun = g)
>  > # does not work, 'fun' or 'FUN" argument is ignored.
> 
> 
> Descriptive Statistics by treatment
> 
> +-------+--------------+--------------+
> |       |Drug          |Placebo       |
> |       |(N=263)       |(N=237)       |
> +-------+--------------+--------------+
> |age    |46.5/49.9/53.2|46.7/50.0/53.4|
> +-------+--------------+--------------+
> |sex : m|   47% (123)  |   44% (104)  |
> +-------+--------------+--------------+
>  >
>  >
>  > (x1 = summarize(cbind(age), llist(treatment), FUN = g, 
> stat.name=c("n", "mean", "sd")))
>    treatment   n mean   sd
> 1      Drug 263 49.9 4.94
> 2   Placebo 237 50.1 4.97
>  >
>  > # this works but table is rotated, and it count data has to be
>  > # treated separately.
> 
> 
> 


-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University



More information about the R-help mailing list