[R] Aggregating multiple columns

Gabor Grothendieck ggrothendieck at gmail.com
Thu Mar 19 23:04:44 CET 2009


Try this:

coef2 <- function(i) cov(x[i,"order"], x[i, "y"]) / var(x[i, "order"])
tapply(1:nrow(x), x$subject, coef2)

On Thu, Mar 19, 2009 at 5:41 PM, Adam D. I. Kramer <adik at ilovebacon.org> wrote:
> Dear colleagues,
>
>        Consider the following data frame:
>
> x <- data.frame(y=rnorm(100),order=rep(1:10,10),subject=rep(1:10,each=10))
>
>        ...it is my goal to aggregate x to compute a linear effect of order
> for each subject. So, ideally, result would be a vector containing a single
> number for each subject, representing the linear relationship between y and
> order.
>
>        I first tried this:
>
> result <- aggregate(x[1:2,],list(subject=x$subject),
>            function (z) { lm(y ~ order, data=z)$coefficients[2] }
>          )
>
> ...because lm(y ~ order, data=x, subset=x$subject==1)$coefficients[2] would
> give me the correct term for subject 1 (i.e., that is the number I am
> actually looking for).
>
>        However, when used on data frames, aggregate() aggregates every
> COLUMN in x _separately_ using FUN...while lm needs both columns *together.*
>
>        ...I then turned to tapply, but that is useful only on "atomic
> objects," and not data frames.
>
>        I have two solutions, which I find inelegant and slow:
>
> 1) result <- sapply(levels(factor(x$subject)),
>               function(z) { lm(y ~ order, data=x,
> subset=subject==z)$coefficients[2]}
>             )
>
> ...this gets the job done, but is very slow.
>
> 2) result <- c();
> for (z in 1:nlevels(x$s2)) { result[z] <- lm(y ~ order, data=x,
> subset=x$s2==levels(x$s2)[z])$coefficients[2] };
> result <- unlist(result);
>
> ...also does the job, but is also very slow.
>
> Is there a better solution? I miss the speed of tapply and aggregate; the
> example has only 100 rows and 10 subjects, but the actual data has many more
> of each.
>
> Cordially,
> Adam D. I. Kramer
> Ph.D. Candidate, Social and Personality Psychology
> University of Oregon
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list