[R] function in aggregate applied to specific columns only
Gabor Grothendieck
ggrothendieck at gmail.com
Mon Jan 4 05:14:38 CET 2010
Here are 6 ways:
1. aggregate
> aggregate(basicSub["score"], basicSub["student"], mean)
student score
1 1 55.0
2 2 60.0
3 3 67.5
2. tapply
> with(basicSub, tapply(score, student, mean))
1 2 3
55.0 60.0 67.5
3. summaryBy in doBy package
> library(doBy)
> summaryBy(. ~ student, basicSub)
student score.mean
1 1 55.0
2 2 60.0
3 3 67.5
4. sqldf in sqldf package. Uses SQL:
> library(sqldf)
> sqldf("select student, avg(score) from basicSub group by student")
student avg(score)
1 1 55.0
2 2 60.0
3 3 67.5
5. summary.formula in Hmisc
> summary(score ~ student, basicSub)
score N=5
+-------+-+-+-----+
| | |N|score|
+-------+-+-+-----+
|student|1|2|55.0 |
| |2|1|60.0 |
| |3|2|67.5 |
+-------+-+-+-----+
|Overall| |5|61.0 |
+-------+-+-+-----+
6. plyr (see Dennis Murphy's solution in this thread)
On Sun, Jan 3, 2010 at 10:46 PM, david hilton shanabrook
<dhshanab at acad.umass.edu> wrote:
> I want to use aggregate with the mean function on specific columns
>
> gender <- factor(c("m", "m", "f", "f", "m"))
> student <- c(0001, 0002, 0003, 0003, 0001)
> score <- c(50, 60, 70, 65, 60)
> basicSub <- data.frame(student, gender, score)
> basicSubMean <- aggregate(basicSub, by=list(basicSub$student), FUN=mean, na.rm=TRUE)
>
> This doesn't work, one cannot take the mean of a factor (gender). Is there any way of specifying which columns to use for the mean? I want to aggregate by student, obtaining mean scores, and assume any other factors are unchanging in a specific student, ie. gender.
>
> Thanks
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list