# [R] sapply following using by with a list of factors

Frank E Harrell Jr f.harrell at vanderbilt.edu
Mon May 30 13:34:33 CEST 2005

```McClatchie, Sam (PIRSA-SARDI) wrote:
> Background:
> OS: Linux Mandrake 10.1
> release: R 2.0.0
> editor: GNU Emacs 21.3.2
> front-end: ESS 5.2.3
> ---------------------------------
> Colleagues
>
> I am having some trouble extracting results from the function by, used to
> average variables in a data.frame first by one factor (depth) and then by a
> second factor (station). The real data.frame is quite large
>
>>dim(data.2001)
>
> [1] 32049  11
>
> Here is a snippet of code:
>
> ## bin density data for each station into 1 m depth bins, containing means
>     data.2001.test\$integer.Depth <- as.factor(round(data.2001.test\$Depth,
> digits=0))
>     attach(data.2001.test)
>     binned.data.2001 <- by(data.2001.test[,5:11], list(depth=integer.Depth,
> station=Station), mean)
>
> and here is a snippet of the data.frame
>
>
>>dim(data.2001.test)
>
> [1] 150  11
>
>>dump("data.2001.test", file=stdout())
>
>
> When I run this code on the full dataset, calculations continue long enough
> to suggest I am generating a huge matrix, so perhaps I'm doing something
> silly? Eventually (well, after maybe 5 minutes) I get a by class object of
> 109 rows (depth category) by 288 columns (station category), so it does seem
> to be working.
>
> I know that you use sapply to get the by class back to a data.frame. I want
> to extract a matrix of mean densities (one of the original variables) at
> each of the 109 depths and 288 stations.
> I have not quite got this right...can you help?
>
> Thanks in advance
>
> Sam
> ----
> Sam McClatchie,
> Biological oceanography
> South Australian Aquatic Sciences Centre

You might try the summarize function in the Hmisc package.
--
Frank E Harrell Jr   Professor and Chair           School of Medicine
Department of Biostatistics   Vanderbilt University

```