[R] attach data from tapply to dataframe

Gabor Grothendieck ggrothendieck at myway.com
Tue Aug 3 18:58:04 CEST 2004


Doran, Harold <HDoran <at> air.org> writes:

: 
: I am working with a longitudinal data set in the long format. This data
: set has three observations per grade level per year. Here are the first
: 10 rows of the data frame:
: 
: >tenn.dat[1:10,]
: 
: year  schid type grade gain  se new cohort
: 
: 6  2001 100005    5     4 33.1 3.5   4      3
: 
: 7  2002 100005    5     4 33.9 3.9   4      2
: 
: 8  2003 100005    5     4 32.3 4.2   4      1
: 
: 10 2001 100005    5     5 22.9 4.0   5      4
: 
: 11 2002 100005    5     5 25.0 3.4   5      3
: 
: 12 2003 100005    5     5  7.8 3.8   5      2
: 
: 18 2001 100010    1     4 34.4 5.9   4      3
: 
: 19 2002 100010    1     4 27.8 5.6   4      2
: 
: 20 2003 100010    1     4 34.6 6.8   4      1
: 
: 22 2001 100010    1     5 21.1 4.8   5      4
: 
: 
: I need to create a new column in this data frame with the mean gain for
: each grade by year and the sd for each grade by year.
: 
: So, I used tapply as follows:
: 
: tapply(tenn.dat[,5],tenn.dat[,c(1,4)],mean) and
: tapply(tenn.dat[,5],tenn.dat[,c(1,4)],sd)  which produces exactly the
: data I would like to attach in column 1 and 2 respectively. 
: 
: I am having a problem connecting this back with the corresponding rows
: in the data frame.
: 
: If I used only one factor instead of two, I was successful connecting
: this with the data frame using:
: 
: m.gain<-tapply(tenn.dat[,5],tenn.dat[,4],mean)
: 
: tenn.dat$m.gain<-m.gain[as.character(tenn.dat$grade)]
: 
: can anyone offer suggestions on a next step?


Suggest you use by, instead of  tapply, like this:

  f <- function(x) { x$mean.gain <- mean(x$gain); x$sd.gain <- sd(x$gain); x }
  res <- by(tenn, list(tenn$year, tenn$grade), f)
  do.call("rbind", res)




More information about the R-help mailing list