Frank E Harrell Jr
Fri Aug 23 06:04:26 CEST 2002

On Thu, 22 Aug 2002 17:09:34 -0500
Tim Wilson wrote:

> Hi everyone,
>
> I wonder if there's a patient soul out there who has a minute to look at
> the following.
>
> I've got a set of summary statistics I need to perform many times.
> Naturally, I've looked at writing a function to automate the process as
> much as possible. (These are the data I mentioned recently in my
> question about weighted means.) I'm having trouble figuring out the
> proper syntax for taking the results of several different functions and
> combining them into a single function. I'm pasting an example below of the
> analysis I need to do for each column of a number of dataframes. This
> works perfectly, but repeating this procedure a couple hundred times
> doesn't thrill me.
>
> The only thing that isn't complete below is that I need the describe
> function (from the Hmisc library) to give me the standard deviation
> as well as the mean. Is it possible to do that without modifying the
> describe function directly?
>
> I'd be glad to hear any suggestions from the R gurus on the list.
>
> -Tim
>
> > lapply(split(faculty\$Q8, list(faculty\$TWOYROR4, faculty\$FACULTY)),
> describe)
> \$"2.1"
> X[[1]]
>       n missing  unique    Mean
>      47       0       3   3.362
>
> 3 (38, 81%), 4 (1, 2%), 5 (8, 17%)
>
> \$"4.1"
> X[[2]]
>       n missing  unique    Mean
>     147       0       5   1.837
>
>           0  1  2  3 4
> Frequency 1 59 57 23 7
> %         1 40 39 16 5
>
> \$"2.2"
> X[[3]]
>       n missing  unique    Mean
>       2       0       1       3
>
> \$"4.2"
> X[[4]]
>       n missing  unique    Mean
>      25       0       5     1.8
>
>           0  1  2  3 4
> Frequency 2  8  9  5 1
> %         8 32 36 20 4
>
> > a <- aggregate(faculty\$Q8, list(CETP=faculty\$CETP), mean)
>
> NOTE: I'm using the aggregate function to weight the means so that each
> CETP contributes equally to an overall mean and standard deviation. I
> need to use this procedure on each of the four results of lapply above.
> I can't figure that out at all.
>
> > a
>                   CETP        x
> 1  ACEPT               2.521739
> 2  LaCEPT              1.666667
> 3  MASTEP              2.442308
> 4  MMSTEC              1.900000
> 5  NYCETP              1.875000
> 6  PETE                1.600000
> 7  STEMTEC             2.428571
> 9  TxCETP              2.218182
> 10 VCEPT               2.222222
> > mean(a\$x)
> [1] 2.162469
> > a <- aggregate(faculty\$Q8, list(CETP=faculty\$CETP), sd)
> > mean(a\$x)
> [1] 1.041506
> >
>
Tim - describe takes a weights= argument, but you're right - describe does not compute the SD [due to my bias against SD as a descriptive statistic, especially for skewed data].

