[R] hairy indexing problem
jasont@indigoindustrial.co.nz
jasont at indigoindustrial.co.nz
Wed Jun 5 07:00:42 CEST 2002
> I've got a data frame that looks like this:
>
> subject foo bar
> 2 1.7 3.2
> 2 2.3 4.1
> 3 7.6 2.3
> 3 7.1 3.3
> 3 7.3 2.3
> 3 7.4 1.3
> 5 6.2 6.1
> 5 3.4 6.9
> ...
>
> That is, I've got multiple rows per subject. I need to compute
> summaries within categories where the subject has the same number of
> rows. For example, subject 2 and 5 both have two rows. I need to
> compute mean for those four values of foo.
> Can someone please give me a
> pointer on the canonical way to do this?
Canonical? Would you settle for "it works for me"? ;)
I suspect one of the gurus has a tidy, elegant way of doing this, but here's
how I'd do it instead (not being a guru). Run-length encoding works pretty well
at things like this.
> d1 <- data.frame(subject=c(2,2,3,3,3,3,5,5),foo=c
(1.7,2.3,7.6,7.1,7.3,7.4,6.2,3.4))
> d1
subject foo
1 2 1.7
2 2 2.3
3 3 7.6
4 3 7.1
5 3 7.3
6 3 7.4
7 5 6.2
8 5 3.4
> d1.subj.rle <- rle(d1$subject[order(d1$subject)])
## make a vector of unique numbers of subjects
> n.subj <- unique(d1.subj.rle$lengths)
## now take means based on number of subjects.
>
> n.subj <- unique(d1.subj.rle$lengths)
> sapply(n.subj,function(x,...) {
+ mean(d1$foo[d1$subject %in% d1.subj.rle$values[d1.subj.rle$lengths == x]])})
[1] 3.40 7.35
##check the numbers
> mean(d1$foo[d1$subject == 2 | d1$subject == 5])
[1] 3.4
> mean(d1$foo[d1$subject == 3])
[1] 7.35
>
That could be a *lot* clearer inside the sapply function; maybe in v2.0 of my
attempt at this ;)
Cheers
Jason
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list