Peter Dalgaard p.dalgaard at biostat.ku.dk
Thu Dec 14 22:58:41 CET 2006

steve wrote:
> If I use latex(summary(X)) where X is a data frame with four
> variables I get something like
>     Rainfall       Education         Popden        Nonwhite    
>  Min.   :10.00   Min.   : 9.00   Min.   :1441   Min.   : 0.80  
>  1st Qu.:32.75   1st Qu.:10.40   1st Qu.:3104   1st Qu.: 4.95  
>  Median :38.00   Median :11.05   Median :3567   Median :10.40  
>  Mean   :37.37   Mean   :10.97   Mean   :3866   Mean   :11.87  
>  3rd Qu.:43.25   3rd Qu.:11.50   3rd Qu.:4520   3rd Qu.:15.65  
>  Max.   :60.00   Max.   :12.30   Max.   :9699   Max.   :38.50  
> where the row headings are repeated four times times.
> Is there an easy way to get a nicely formatted table,
> something like
>         Rainfall     Education   Popden    Nonwhite    
>  Min.     10.00       9.00        1441        0.80  
>  1st Qu.  32.75      10.40        3104        4.95  
>  Median   38.00      11.05        3567       10.40  
>  Mean     37.37      10.97        3866       11.87  
>  3rd Qu.  43.25      11.50        4520       15.65  
>  Max.     60.00      12.30        9699       38.50  
Hmm, no. Not without further ado. The function summary.data.frame 
produces a table with character entries like "Min. : 1.00 ".

To do better, you first have to note that it can only possibly work for 
purely numeric data frames. If you have one of those, then you might 
base something off sapply(X, summary), except that it won't work if only 
some columns have NA's. Here's an idea:

> my.summary <- function(x){s <- summary(x); if (length(s)==6) 
       c(s,"NA's"=0) else s}
> sapply(airquality,my.summary)
         Ozone Solar.R   Wind  Temp Month  Day
Min.      1.00     7.0  1.700 56.00 5.000  1.0
1st Qu.  18.00   115.8  7.400 72.00 6.000  8.0
Median   31.50   205.0  9.700 79.00 7.000 16.0
Mean     42.13   185.9  9.958 77.88 6.993 15.8
3rd Qu.  63.25   258.8 11.500 85.00 8.000 23.0
Max.    168.00   334.0 20.700 97.00 9.000 31.0
NA's     37.00     7.0  0.000  0.00 0.000  0.0

However, there's an issue with the NA count getting displayed to
three decimal places...

