[R] attach data from tapply to dataframe
Gabor Grothendieck
ggrothendieck at myway.com
Tue Aug 3 18:58:04 CEST 2004
Doran, Harold <HDoran <at> air.org> writes:
:
: I am working with a longitudinal data set in the long format. This data
: set has three observations per grade level per year. Here are the first
: 10 rows of the data frame:
:
: >tenn.dat[1:10,]
:
: year schid type grade gain se new cohort
:
: 6 2001 100005 5 4 33.1 3.5 4 3
:
: 7 2002 100005 5 4 33.9 3.9 4 2
:
: 8 2003 100005 5 4 32.3 4.2 4 1
:
: 10 2001 100005 5 5 22.9 4.0 5 4
:
: 11 2002 100005 5 5 25.0 3.4 5 3
:
: 12 2003 100005 5 5 7.8 3.8 5 2
:
: 18 2001 100010 1 4 34.4 5.9 4 3
:
: 19 2002 100010 1 4 27.8 5.6 4 2
:
: 20 2003 100010 1 4 34.6 6.8 4 1
:
: 22 2001 100010 1 5 21.1 4.8 5 4
:
:
: I need to create a new column in this data frame with the mean gain for
: each grade by year and the sd for each grade by year.
:
: So, I used tapply as follows:
:
: tapply(tenn.dat[,5],tenn.dat[,c(1,4)],mean) and
: tapply(tenn.dat[,5],tenn.dat[,c(1,4)],sd) which produces exactly the
: data I would like to attach in column 1 and 2 respectively.
:
: I am having a problem connecting this back with the corresponding rows
: in the data frame.
:
: If I used only one factor instead of two, I was successful connecting
: this with the data frame using:
:
: m.gain<-tapply(tenn.dat[,5],tenn.dat[,4],mean)
:
: tenn.dat$m.gain<-m.gain[as.character(tenn.dat$grade)]
:
: can anyone offer suggestions on a next step?
Suggest you use by, instead of tapply, like this:
f <- function(x) { x$mean.gain <- mean(x$gain); x$sd.gain <- sd(x$gain); x }
res <- by(tenn, list(tenn$year, tenn$grade), f)
do.call("rbind", res)
More information about the R-help
mailing list