[R] SUM,COUNT,AVG
hadley wickham
h.wickham at gmail.com
Mon Apr 6 16:56:05 CEST 2009
On Mon, Apr 6, 2009 at 9:34 AM, Stavros Macrakis <macrakis at alum.mit.edu> wrote:
> There are various ways to do this in R.
>
> # sample data
> dd <- data.frame(a=1:10,b=sample(3,10,replace=T),c=sample(3,10,replace=T))
>
> Using the standard built-in functions, you can use:
>
> *** aggregate ***
>
> aggregate(dd,list(b=dd$b,c=dd$c),sum)
> b c a b c
> 1 1 1 10 2 2
> 2 2 1 3 2 1
> ....
>
> *** tapply ***
>
> tapply(dd$a,interaction(dd$b,dd$c),sum)
> 1.1 2.1 3.1 1.2 2.2 3.2 1.3
> 2.3
> 5.000000 3.000000 10.000000 5.000000 NA NA 5.000000
> ...
>
> But the nicest way is probably to use the plyr package:
>
>> library(plyr)
>> ddply(dd,~b+c,sum)
> b c V1
> 1 1 1 14
> 2 2 1 6
> ....
>
> ********
>
> Unfortunately, none of these approaches allows you do return more than one
> result from the function, so you'll need to write
>
>> ddply(dd,~b+c,length) # count
>> ddply(dd,~b+c,sum)
>> ddply(dd,~b+c,mean) # arithmetic average
>
> There is an 'each' function in plyr, but it doesn't seem to be compatible
> with ddply.
That's because ddply applies the function to the whole data frame, not
just the columns that aren't participating in the split. One way
around it is:
ddply(dd, ~ b + c, function(df) each(length, sum, mean)(df$a))
I haven't figured out a more elegant way to specify this yet.
Hadley
--
http://had.co.nz/
More information about the R-help
mailing list