[R] basic table statistics

David Winsemius dwinsemius at comcast.net
Sat Apr 24 23:25:14 CEST 2010

```On Apr 23, 2010, at 3:48 PM, Maxim wrote:

> I have a very simple question, but I'm obviously not able to solve the
> problem on my own.
>
> I have a data.frame like
>
> sample(c("A","B","C"),size=20,replace = T)->type
>
> rnorm(20)->value
>
> data.frame(ty=type,val=value)->test
>
> There must be some built in functions, that will do some descriptive
> statistics with tabular output, in the end I like to have something
> like
>
> number of samples mean sd .............
>
> A 5
> B 9
> C 6
>
> So I need a function that counts the number of  occurrences of
> factors in
> type and then does something like the *summary* function, but factor
> specific.
>
> I tried:
> vector()->Median
> vector()->SD
> vector()->Mean
>
> as.data.frame(table(type))->int
> for (count in c(1:(nrow(int))))
>     {
> subset(test, ty==as.character(int\$type[count])) -> subtest
> median(subtest\$val)->Median[count]
> sd(subtest\$val)->SD[count]
> mean(subtest\$val)->Mean[count]
> }
>
>
> cbind(int,Median,SD,Mean)

> require(Design)  # loads Hmisc which has ne of many version of
describe()
> describe(test)
test

2  Variables      20  Observations
-------------------------------------------------------------------------
ty
n missing  unique
20       0       3

A (4, 20%), B (5, 25%), C (11, 55%)
-------------------------------------------------------------------------
val
n   missing    unique      Mean       .05       .10       .25
20         0        20   0.07383 -0.865776 -0.815317 -0.707465
.50       .75       .90       .95
0.005735  0.634226  1.270066  1.771820

lowest : -1.7965 -0.8168 -0.8152 -0.8040 -0.7170
highest:  0.6790  1.0680  1.2149  1.7665  1.8729
-------------------------------------------------------------------------

> require(doBy)
> summaryBy(value~ty, test, FUN=list(length, mean, min, max, sd,
median))
ty value.length  value.mean  value.min value.max  value.sd
1  A            4 -0.03442822 -0.8151531  1.766502 1.2258221
2  B            5  0.34541927 -0.8167919  1.214906 0.7647165
3  C           11 -0.01025352 -1.7964684  1.872865 1.0109676
value.median
1  -0.54453098
2   0.57020532
3  -0.06826249

The by() function which is an application of tapply can also be used.

>
>
>
> This works, but: isn't this much too complicated, I bet there is such
> functionality embedded in the base packages, but I cannot find it.
>
>
> Maxim

David Winsemius, MD
West Hartford, CT

```