[R] function in aggregate applied to specific columns only

Gabor Grothendieck ggrothendieck at gmail.com
Mon Jan 4 05:14:38 CET 2010


Here are 6 ways:

1. aggregate

> aggregate(basicSub["score"], basicSub["student"], mean)
  student score
1       1  55.0
2       2  60.0
3       3  67.5

2. tapply

> with(basicSub, tapply(score, student, mean))
   1    2    3
55.0 60.0 67.5

3. summaryBy in doBy package

> library(doBy)
> summaryBy(. ~ student, basicSub)
  student score.mean
1       1       55.0
2       2       60.0
3       3       67.5

4. sqldf in sqldf package.  Uses SQL:

> library(sqldf)
> sqldf("select student, avg(score) from basicSub group by student")
  student avg(score)
1       1       55.0
2       2       60.0
3       3       67.5

5. summary.formula in Hmisc

> summary(score ~ student, basicSub)
score    N=5

+-------+-+-+-----+
|       | |N|score|
+-------+-+-+-----+
|student|1|2|55.0 |
|       |2|1|60.0 |
|       |3|2|67.5 |
+-------+-+-+-----+
|Overall| |5|61.0 |
+-------+-+-+-----+

6. plyr (see Dennis Murphy's solution in this thread)


On Sun, Jan 3, 2010 at 10:46 PM, david hilton shanabrook
<dhshanab at acad.umass.edu> wrote:
> I want to use aggregate with the mean function on specific columns
>
> gender <- factor(c("m", "m", "f", "f", "m"))
> student <- c(0001, 0002, 0003, 0003, 0001)
> score <- c(50, 60, 70, 65, 60)
> basicSub <- data.frame(student, gender, score)
> basicSubMean <- aggregate(basicSub, by=list(basicSub$student), FUN=mean, na.rm=TRUE)
>
> This doesn't work, one cannot take the mean of a factor (gender).  Is there any way of specifying which columns to use for the mean?  I want to aggregate by student, obtaining mean scores, and assume any other factors are unchanging in a specific student, ie. gender.
>
> Thanks
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list