[R] mean for subset

Tue Jan 5 20:22:04 CET 2010

Here is the solution using sqldf which can do it in one statement:

> # read in data
> Lines <- "OBS     NAME   SCORE
+ 1          Tom       92
+ 2          Tom       88
+ 3          Tom       56
+ 4          James    85
+ 5          James    75
+ 6          James    32
+ 7          Dawn     56
+ 8          Dawn     91
+ 9          Clara     95
+ 10        Clara     84"
>
> DF <- read.table(textConnection(Lines), header = TRUE)
>
> # run
> library(sqldf)
> sqldf("select NAME, avg(SCORE) from DF group by NAME having count(*) = 3")
   NAME avg(SCORE)
1 James   64.00000
2   Tom   78.66667

On Tue, Jan 5, 2010 at 2:03 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> Have a look at this post and the rest of that thread:
>
> https://stat.ethz.ch/pipermail/r-help/2010-January/223420.html
>
> On Tue, Jan 5, 2010 at 1:29 PM, Geoffrey Smith <gps at asu.edu> wrote:
>> Hello, does anyone know how to take the mean for a subset of observations?
>> For example, suppose my data looks like this:
>>
>> OBS     NAME   SCORE
>> 1          Tom       92
>> 2          Tom       88
>> 3          Tom       56
>> 4          James    85
>> 5          James    75
>> 6          James    32
>> 7          Dawn     56
>> 8          Dawn     91
>> 9          Clara     95
>> 10        Clara     84
>>
>> Is there a way to get the mean of the SCORE variable by NAME but only when
>> the number of observations is equal to 3?  In other words, is there a way to
>> get the mean of the SCORE variable for Tom and James, but not for Dawn and
>> Clara?  Thank you.
>>
>> --
>> Geoffrey Smith
>> Visiting Assistant Professor
>> Department of Finance
>> W. P. Carey School of Business
>> Arizona State University
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>