[R] means by column after split

Tue Feb 20 08:43:19 CET 2001

On Tue, 20 Feb 2001, S.McClatchie wrote:

> Colleagues
>
> ----------------------------------
> System info:
> R version rw1020 on NT
> ESS using emacs ver. 20.4
>
> ----------------------------------
>
> I need to get the means for each column of a dataframe in the list
> created by splitting a data frame. At present, I am getting the mean of all
> columns in aggregate.
>
> The structure of the unsplit data is:
>
> > shuttle.tr1[1:10,]
>       juliandate       lat temp.degC
> 24892   305.9581 -43.18243  11.90729
> 24893   305.9581 -43.18258  11.90854
> 24894   305.9582 -43.18272  11.94356
> 24895   305.9582 -43.18286  11.95356
> 24896   305.9583 -43.18300  11.95544
> 24897   305.9583 -43.18315  11.97670
> 24898   305.9584 -43.18329  11.99171
> 24899   305.9584 -43.18343  11.99546
> 24900   305.9585 -43.18358  11.98546
> 24901   305.9585 -43.18372  11.98858
>
> Now I split the data into 8 groups:
>
> > fine <- 8
> > fsh <- factor(cut(shuttle.tr1$juliandate, fine))
> > b.shuttle.tr1 <- split.data.frame(shuttle.tr1, fsh)
>
> Here's where i need help, in getting the mean by columns:
>

Here b.shuttle.tr1 is a list of data frames.  You may find it easier to
use by(), BTW.  You don't want to apply mean to the data frame, but to
each column of it.  So something like

colmean <- function(x) sapply(x, mean)
by(shuttle.tr1, fsh, colmean)

should do what you want.

Let's try an example
data(iris)
by(iris[, -5], iris[, 5], colmean)
iris[, 5]: setosa
Sepal.Length  Sepal.Width Petal.Length  Petal.Width
       5.006        3.428        1.462        0.246
------------------------------------------------------------
iris[, 5]: versicolor
Sepal.Length  Sepal.Width Petal.Length  Petal.Width
       5.936        2.770        4.260        1.326
------------------------------------------------------------
iris[, 5]: virginica
Sepal.Length  Sepal.Width Petal.Length  Petal.Width
       6.588        2.974        5.552        2.026

zz <- by(iris[, -5], iris[, 5], colmean)
matrix(unlist(zz), 3, 4, byrow=T, dimnames=list(names(zz), names(zz[[1]])))
           Sepal.Length Sepal.Width Petal.Length Petal.Width
setosa            5.006       3.428        1.462       0.246
versicolor        5.936       2.770        4.260       1.326
virginica         6.588       2.974        5.552       2.026

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._