[R] hairy indexing problem

Wed Jun 5 07:00:42 CEST 2002

> I've got a data frame that looks like this:
> 
>    subject   foo   bar
>       2      1.7   3.2
>       2      2.3   4.1
>       3      7.6   2.3
>       3      7.1   3.3
>       3      7.3   2.3
>       3      7.4   1.3
>       5      6.2   6.1
>       5      3.4   6.9
>      ...
> 
> That is, I've got multiple rows per subject.  I need to compute
> summaries within categories where the subject has the same number of
> rows.  For example, subject 2 and 5 both have two rows.  I need to
> compute mean for those four values of foo. 
> Can someone please give me a
> pointer on the canonical way to do this?

Canonical?  Would you settle for "it works for me"?  ;)

I suspect one of the gurus has a tidy, elegant way of doing this, but here's 
how I'd do it instead (not being a guru). Run-length encoding works pretty well 
at things like this.

> d1 <- data.frame(subject=c(2,2,3,3,3,3,5,5),foo=c
(1.7,2.3,7.6,7.1,7.3,7.4,6.2,3.4))
> d1
  subject foo
1       2 1.7
2       2 2.3
3       3 7.6
4       3 7.1
5       3 7.3
6       3 7.4
7       5 6.2
8       5 3.4
> d1.subj.rle <- rle(d1$subject[order(d1$subject)])
## make a vector of unique numbers of subjects

> n.subj <- unique(d1.subj.rle$lengths)
## now take means based on number of subjects.
> 
> n.subj <- unique(d1.subj.rle$lengths)
> sapply(n.subj,function(x,...) { 
+ mean(d1$foo[d1$subject %in% d1.subj.rle$values[d1.subj.rle$lengths == x]])})
[1] 3.40 7.35 
##check the numbers
> mean(d1$foo[d1$subject == 2 | d1$subject == 5])
[1] 3.4
> mean(d1$foo[d1$subject == 3])
[1] 7.35
> 

That could be a *lot* clearer inside the sapply function; maybe in v2.0 of my 
attempt at this ;)

Cheers

Jason

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._