[R] Summary tables of large datasets including character and numerical variables

David Winsemius dwinsemius at comcast.net
Tue Dec 27 04:38:43 CET 2011


On Dec 26, 2011, at 5:44 AM, sparandekar wrote:

> Hello !
>
> I am attempting to switch from being a long time SAS user to R, and  
> would
> really appreciate a bit of help ! The first thing I do in getting a  
> large
> dataset (thousands of obervations and hundreds of variables) is to  
> run a SAS
> command PROC CONTENTS VARNUM command - this provides me a table with  
> the
> name of each variable, its type and length;  then I run a PROC MEANS  
> - for
> numerical variables it gives me a table with the number of non-missing
> values, min, max, mean and std. dev.  My data usually has errors and  
> this
> first step helps me to spot the errors and 'clean' the dataset.
>
> The 'summary' function in R and other function as part of Hmisc or  
> Psych
> package do not work for me.
>
> How can I get a table from an R data.frame that has the following  
> structure
> (header row and example).
>
> Rowname  Character/Integer  Length   Non-Missing    Minimum
> Maximum              Mean                   SD
>
> HHID            Integer                       12            32,344
> 114455007701   514756007812       2.345 x 10^10    1.456 x 10^10
> Head            Character                   38            24,566
> -                                   -                         -
> -

I generally use ( in order of increasing information content and  
increasing length of output):

names(dfrm)

str(dfrm)

Hmisc::describe(dfrm)

(Several other packages have their own versions of 'describe'.)

-- 

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list