[R] combining output from several operations

Fri Aug 23 06:04:26 CEST 2002

On Thu, 22 Aug 2002 17:09:34 -0500
Tim Wilson <wilson at visi.com> wrote:

> Hi everyone,
> 
> I wonder if there's a patient soul out there who has a minute to look at
> the following. 
> 
> I've got a set of summary statistics I need to perform many times.
> Naturally, I've looked at writing a function to automate the process as
> much as possible. (These are the data I mentioned recently in my
> question about weighted means.) I'm having trouble figuring out the
> proper syntax for taking the results of several different functions and
> combining them into a single function. I'm pasting an example below of the
> analysis I need to do for each column of a number of dataframes. This
> works perfectly, but repeating this procedure a couple hundred times
> doesn't thrill me.
> 
> The only thing that isn't complete below is that I need the describe
> function (from the Hmisc library) to give me the standard deviation
> as well as the mean. Is it possible to do that without modifying the
> describe function directly?
> 
> I'd be glad to hear any suggestions from the R gurus on the list.
> 
> -Tim
> 
> > lapply(split(faculty$Q8, list(faculty$TWOYROR4, faculty$FACULTY)),
> describe)
> $"2.1"
> X[[1]] 
>       n missing  unique    Mean 
>      47       0       3   3.362 
> 
> 3 (38, 81%), 4 (1, 2%), 5 (8, 17%) 
> 
> $"4.1"
> X[[2]] 
>       n missing  unique    Mean 
>     147       0       5   1.837 
> 
>           0  1  2  3 4
> Frequency 1 59 57 23 7
> %         1 40 39 16 5
> 
> $"2.2"
> X[[3]] 
>       n missing  unique    Mean 
>       2       0       1       3 
> 
> $"4.2"
> X[[4]] 
>       n missing  unique    Mean 
>      25       0       5     1.8 
> 
>           0  1  2  3 4
> Frequency 2  8  9  5 1
> %         8 32 36 20 4
> 
> > a <- aggregate(faculty$Q8, list(CETP=faculty$CETP), mean)
> 
> NOTE: I'm using the aggregate function to weight the means so that each
> CETP contributes equally to an overall mean and standard deviation. I
> need to use this procedure on each of the four results of lapply above.
> I can't figure that out at all.
> 
> > a
>                   CETP        x
> 1  ACEPT               2.521739
> 2  LaCEPT              1.666667
> 3  MASTEP              2.442308
> 4  MMSTEC              1.900000
> 5  NYCETP              1.875000
> 6  PETE                1.600000
> 7  STEMTEC             2.428571
> 8  Temple/Philadelphia 2.750000
> 9  TxCETP              2.218182
> 10 VCEPT               2.222222
> > mean(a$x)
> [1] 2.162469
> > a <- aggregate(faculty$Q8, list(CETP=faculty$CETP), sd)
> > mean(a$x)
> [1] 1.041506
> >
> 
> -- 
> Tim Wilson      |   Visit Sibley online:   | Check out:
> Henry Sibley HS |  http://www.isd197.org   | http://www.zope.com
> W. St. Paul, MN |                          | http://slashdot.org
> wilson at visi.com |  <dtml-var pithy_quote>  | http://linux.com

Tim - describe takes a weights= argument, but you're right - describe does not compute the SD [due to my bias against SD as a descriptive statistic, especially for skewed data].

Frank Harrell

-- 
Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._