# [R] problems with by()

Thomas Lumley tlumley at u.washington.edu
Thu Jan 30 01:19:03 CET 2003

```On Wed, 29 Jan 2003, Heberto Ghezzo wrote:

> Hello, another problem.
>  > x<-rep(1,10)
>  > y<-rep(c(1,2),c(5,5))
>  > z<-seq(1:10)
>  > ab<-data.frame(x,y,z)
> #
>     now I want to do some work by the value of 'y'
>  > by(ab,y,mean)
> y: 1
> x y z
> 1 1 3
> ------------------------------------------------------------
> y: 2
> x y z
> 1 2 8
> #
>     I do not want all the means, only the mean of 'z'
>  > by(ab,y,function(x) mean(z))
> y: 1
> [1] 5.5
> ------------------------------------------------------------
> y: 2
> [1] 5.5
>  > by(ab,y,function(x) mean(z,data=x))
> y: 1
> [1] 5.5
> ------------------------------------------------------------
> y: 2
> [1] 5.5
>  >
> #
>     so, how can I get the function(x) to be applied to each level
> of the index variable y.
> Actually I use my own function but the same happens, it is applied to all
> the data and there is no partition of the data acording to index

The function you are applying is

function(x) mean(z)

That is, no matter what x is supplied, it calculates the mean of the
variable z, which is in your global workspace. The mean of z is 5.5

What you want is
function(x) mean(x\$z)
That is, take a supplied data frame and compute the mean of its `z'
column.

I try to use argument names that remind me what is happening in functions
like by()

by(ab,y, function(df) mean(df\$z))
or even
by(ab, y, function(subset) mean(subset\$z))

> Do not tell me that this version of R is completely buggy, I was waiting
> for the 1.7 to be out before upgrading

I think it's fair to characterise this sort of commment as `unhelpful'.

-thomas

```