[R] Calculating Summaries for each level of a Categorical variable

Christos Argyropoulos argchris at hotmail.com
Sun Jun 27 14:36:56 CEST 2010


Hi Raoul, 
I presume you need these summaries for a table of descriptive statistics for a thesis/report/paper 
("Table 1" as known informally by medical researchers). If this is the case, then specify 
method="reverse" to summary.formula. In the following small example, I create 4 groups of patients 
and specify 2 characteristics per patient (age and gender) and use summary.formula to summarize 
characteristics by group. Running the stats on patient characteristics by group is optional but 
is included for completeness. If you are looking for something like this I strongly advise you spent
some time fiddling around with summary.formula and read: 

Harrell FE (2004): Statistical tables and plots using S and LaTeX 
(available from http://biostat.mc.vanderbilt.edu/twiki/pub/Main/StatReport/summary.pdf

The 2-3 hours you are going to need to familiarize yourself with this package are really worth spending 
for (especially if you are going to use call LaTEX on the output). If you are a Windows user, copy and
paste the output of the print function into Excel or OpenOffice and use the Text to Columns facilities
of the two programs to format the output into a table that can be used inside a manuscript.

Christos

## R-code follows

library(Hmisc)
## One baseline factor (e.g. patient group)
grp<-round(runif(20,1,4))
grp<-factor(grp,labels=paste("Group",1:4))

## Another factor (e.g. sex)
sex<-round(runif(20,1,2))
sex<-factor(sex,labels=c("Male","Female"))

## A continuous variable (e.g. age)
age<-rlnorm(20,4,.1)

## A data frame
data<-data.frame(age=age,grp=grp,sex=sex)

## Table 1
sm<-summary(grp~sex+age,method="reverse",overall=T,test=T)
print(sm,dig=2,exclude1=F)

Descriptive Statistics by grp

+----------+------------------+------------------+------------------+------------------+------------------+----------------------------+
|          |Group 1           |Group 2           |Group 3           |Group 4           |Combined          |  Test                      |
|          |(N=3)             |(N=6)             |(N=8)             |(N=3)             |(N=20)            |Statistic                   |
+----------+------------------+------------------+------------------+------------------+------------------+----------------------------+
|sex : Male|          67% ( 2)|          67% ( 4)|          25% ( 2)|          67% ( 2)|          50% (10)|Chi-square=3.3 d.f.=3 P=0.34|
+----------+------------------+------------------+------------------+------------------+------------------+----------------------------+
|    Female|          33% ( 1)|          33% ( 2)|          75% ( 6)|          33% ( 1)|          50% (10)|                            |
+----------+------------------+------------------+------------------+------------------+------------------+----------------------------+
|age       |          60/62/65|          51/55/60|          46/51/57|          46/48/52|          49/54/60|   F=2.9 d.f.=3,16 P=0.068  |
+----------+------------------+------------------+------------------+------------------+------------------+----------------------------+

  

> Date: Sat, 26 Jun 2010 21:48:05 -0700
> From: raoul.t.dsouza at gmail.com
> To: r-help at r-project.org
> Subject: Re: [R] Calculating Summaries for each level of a Categorical	variable
> 
> 
> Hi Christos,
> 
> Thanks for this. I had a look at Summary.Forumla in the Hmisc package and it
> is extremely complicated for me. Still trying to decipher how I could use
> it.
> 
> Regards,
> Raoul
> -- 
> View this message in context: http://r.789695.n4.nabble.com/Calculating-Summaries-for-each-level-of-a-Categorical-variable-tp2269349p2269816.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
 		 	   		  
_________________________________________________________________
Hotmail: Free, trusted and rich email service.



More information about the R-help mailing list