[R] problems with by()
Thomas Lumley
tlumley at u.washington.edu
Thu Jan 30 01:19:03 CET 2003
On Wed, 29 Jan 2003, Heberto Ghezzo wrote:
> Hello, another problem.
> > x<-rep(1,10)
> > y<-rep(c(1,2),c(5,5))
> > z<-seq(1:10)
> > ab<-data.frame(x,y,z)
> #
> now I want to do some work by the value of 'y'
> > by(ab,y,mean)
> y: 1
> x y z
> 1 1 3
> ------------------------------------------------------------
> y: 2
> x y z
> 1 2 8
> #
> I do not want all the means, only the mean of 'z'
> > by(ab,y,function(x) mean(z))
> y: 1
> [1] 5.5
> ------------------------------------------------------------
> y: 2
> [1] 5.5
> > by(ab,y,function(x) mean(z,data=x))
> y: 1
> [1] 5.5
> ------------------------------------------------------------
> y: 2
> [1] 5.5
> >
> #
> so, how can I get the function(x) to be applied to each level
> of the index variable y.
> Actually I use my own function but the same happens, it is applied to all
> the data and there is no partition of the data acording to index
The function you are applying is
function(x) mean(z)
That is, no matter what x is supplied, it calculates the mean of the
variable z, which is in your global workspace. The mean of z is 5.5
What you want is
function(x) mean(x$z)
That is, take a supplied data frame and compute the mean of its `z'
column.
I try to use argument names that remind me what is happening in functions
like by()
by(ab,y, function(df) mean(df$z))
or even
by(ab, y, function(subset) mean(subset$z))
> Do not tell me that this version of R is completely buggy, I was waiting
> for the 1.7 to be out before upgrading
I think it's fair to characterise this sort of commment as `unhelpful'.
-thomas
More information about the R-help
mailing list