[R] Summary tables of large datasets including character and numerical variables
David Winsemius
dwinsemius at comcast.net
Tue Dec 27 04:38:43 CET 2011
On Dec 26, 2011, at 5:44 AM, sparandekar wrote:
> Hello !
>
> I am attempting to switch from being a long time SAS user to R, and
> would
> really appreciate a bit of help ! The first thing I do in getting a
> large
> dataset (thousands of obervations and hundreds of variables) is to
> run a SAS
> command PROC CONTENTS VARNUM command - this provides me a table with
> the
> name of each variable, its type and length; then I run a PROC MEANS
> - for
> numerical variables it gives me a table with the number of non-missing
> values, min, max, mean and std. dev. My data usually has errors and
> this
> first step helps me to spot the errors and 'clean' the dataset.
>
> The 'summary' function in R and other function as part of Hmisc or
> Psych
> package do not work for me.
>
> How can I get a table from an R data.frame that has the following
> structure
> (header row and example).
>
> Rowname Character/Integer Length Non-Missing Minimum
> Maximum Mean SD
>
> HHID Integer 12 32,344
> 114455007701 514756007812 2.345 x 10^10 1.456 x 10^10
> Head Character 38 24,566
> - - -
> -
I generally use ( in order of increasing information content and
increasing length of output):
names(dfrm)
str(dfrm)
Hmisc::describe(dfrm)
(Several other packages have their own versions of 'describe'.)
--
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list