[R] function in aggregate applied to specific columns only

Matthew Dowle mdowle at mdowle.plus.com
Mon Jan 4 14:54:11 CET 2010


> That makes eight solutions. Any others?  :)
A ninth was detailed in two other threads last month. The first link 
compares to ave().
http://tolstoy.newcastle.edu.au/R/e8/help/09/12/9014.html
http://tolstoy.newcastle.edu.au/R/e8/help/09/12/8830.html

"Dennis Murphy" <djmuser at gmail.com> wrote in message 
news:9a8a6c631001032057qc5cd68j9ec3882043dec0bc at mail.gmail.com...
> Just for the fun of it, here are two more: by and ave.
>
>
>> with(basicSub, by(score, student, mean))
> student: 1
> [1] 55
> ------------------------------------------------------------
> student: 2
> [1] 60
> ------------------------------------------------------------
> student: 3
> [1] 67.5
>
> Not my favorite print method;  to return a vector, do instead
>> as.vector(with(basicSub, by(score, student, mean)))
> [1] 55.0 60.0 67.5
> You can cbind the unique student IDs to get a matrix result.
>
> ave() is used to map the average (or comparable summary) to each
> observation.
> By itself, it returns a vector of the same length as the number of
> observations:
>> with(basicSub, ave(score, student))
> [1] 55.0 60.0 67.5 67.5 55.0
>
> It's more useful if you want to add the means to the data frame:
>> transform(basicSub, avg = ave(score, student))
>  student gender score  avg
> 1       1      m    50 55.0
> 2       2      m    60 60.0
> 3       3      f    70 67.5
> 4       3      f    65 67.5
> 5       1      m    60 55.0
>
> That makes eight solutions. Any others?  :)
>
> Dennis
>
>
> On Sun, Jan 3, 2010 at 8:14 PM, Gabor Grothendieck
> <ggrothendieck at gmail.com>wrote:
>
>> Here are 6 ways:
>>
>> 1. aggregate
>>
>> > aggregate(basicSub["score"], basicSub["student"], mean)
>>  student score
>> 1       1  55.0
>> 2       2  60.0
>> 3       3  67.5
>>
>> 2. tapply
>>
>> > with(basicSub, tapply(score, student, mean))
>>   1    2    3
>> 55.0 60.0 67.5
>>
>> 3. summaryBy in doBy package
>>
>> > library(doBy)
>> > summaryBy(. ~ student, basicSub)
>>  student score.mean
>> 1       1       55.0
>> 2       2       60.0
>> 3       3       67.5
>>
>> 4. sqldf in sqldf package.  Uses SQL:
>>
>> > library(sqldf)
>> > sqldf("select student, avg(score) from basicSub group by student")
>>  student avg(score)
>> 1       1       55.0
>> 2       2       60.0
>> 3       3       67.5
>>
>> 5. summary.formula in Hmisc
>>
>> > summary(score ~ student, basicSub)
>> score    N=5
>>
>> +-------+-+-+-----+
>> |       | |N|score|
>> +-------+-+-+-----+
>> |student|1|2|55.0 |
>> |       |2|1|60.0 |
>> |       |3|2|67.5 |
>> +-------+-+-+-----+
>> |Overall| |5|61.0 |
>> +-------+-+-+-----+
>>
>> 6. plyr (see Dennis Murphy's solution in this thread)
>>
>>
>> On Sun, Jan 3, 2010 at 10:46 PM, david hilton shanabrook
>> <dhshanab at acad.umass.edu> wrote:
>> > I want to use aggregate with the mean function on specific columns
>> >
>> > gender <- factor(c("m", "m", "f", "f", "m"))
>> > student <- c(0001, 0002, 0003, 0003, 0001)
>> > score <- c(50, 60, 70, 65, 60)
>> > basicSub <- data.frame(student, gender, score)
>> > basicSubMean <- aggregate(basicSub, by=list(basicSub$student), 
>> > FUN=mean,
>> na.rm=TRUE)
>> >
>> > This doesn't work, one cannot take the mean of a factor (gender).  Is
>> there any way of specifying which columns to use for the mean?  I want to
>> aggregate by student, obtaining mean scores, and assume any other factors
>> are unchanging in a specific student, ie. gender.
>> >
>> > Thanks
>> >        [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>



More information about the R-help mailing list