[R] SUM,COUNT,AVG

David Winsemius dwinsemius at comcast.net
Tue Apr 7 05:49:15 CEST 2009


On Apr 6, 2009, at 6:31 PM, Jun Shen wrote:

> This is a good example to compare different approaches. My  
> understanding is
>
> aggregate() can apply one function to multiple columns
> summarize() can apply multiple functions to one column
> I am not sure if ddply() can actually apply multiple functions to  
> multiple
> columns? This is what I would like to do. The syntax in the help is  a
> little confusing to me. Appreciate more comments. Thanks

This looks reasonably straight-forward:


lapply(c(mean,sd, length), function(func)
                             {aggregate(state.x77, list(Region =  
state.region), func)} )

-- 
David Winsemius
>
>
> Jun
>
> On Mon, Apr 6, 2009 at 9:51 AM, Stavros Macrakis <macrakis at alum.mit.edu 
> >wrote:
>
>> Actually, ddply does this perfectly ... I had made a mistake in using
>> 'each'.  The correct code is:
>>
>> ddply(dd,~b+c,function(x)each(count=length,sum=sum,avg=mean)(x$a))
>>
>> b c count sum       avg
>> 1 1 1     2  10  5.000000
>> 2 2 1     1   3  3.000000
>> 3 3 1     1  10 10.000000
>> 4 1 2     2  10  5.000000
>> 5 1 3     1   5  5.000000
>> 6 3 3     3  17  5.666667
>>
>> Hope this helps,
>>
>>            -s
>>
>>
>>
>> On Mon, Apr 6, 2009 at 10:34 AM, Stavros Macrakis <macrakis at alum.mit.edu
>>> wrote:
>>
>>> There are various ways to do this in R.
>>>
>>> # sample data
>>> dd <-
>> data.frame(a=1:10,b=sample(3,10,replace=T),c=sample(3,10,replace=T))
>>>
>>> Using the standard built-in functions, you can use:
>>>
>>> *** aggregate ***
>>>
>>> aggregate(dd,list(b=dd$b,c=dd$c),sum)
>>>  b c  a b c
>>> 1 1 1 10 2 2
>>> 2 2 1  3 2 1
>>> ....
>>>
>>> *** tapply ***
>>>
>>> tapply(dd$a,interaction(dd$b,dd$c),sum)
>>>      1.1       2.1       3.1       1.2       2.2       3.2       1.3
>>> 2.3
>>> 5.000000  3.000000 10.000000  5.000000        NA        NA  5.000000
>>> ...
>>>
>>> But the nicest way is probably to use the plyr package:
>>>
>>>> library(plyr)
>>>> ddply(dd,~b+c,sum)
>>>  b c V1
>>> 1 1 1 14
>>> 2 2 1  6
>>> ....
>>>
>>> ********
>>>
>>> Unfortunately, none of these approaches allows you do return more  
>>> than
>> one
>>> result from the function, so you'll need to write
>>>
>>>> ddply(dd,~b+c,length)   # count
>>>> ddply(dd,~b+c,sum)
>>>> ddply(dd,~b+c,mean)   # arithmetic average
>>>
>>> There is an 'each' function in plyr, but it doesn't seem to be  
>>> compatible
>>> with ddply.
>>>
>>>               -s
>>>
>>> On Mon, Apr 6, 2009 at 5:37 AM, calpeda <mauro.biasolo at calpeda.it>
>> wrote:
>>>
>>>>
>>>> Hi,
>>>> I ve been searching a lot in internet..but I can t find a solution
>>>> Attached, you find a file.
>>>> I need for each (Materiale, tpdv, UM) to find sum,avg and count
>>>> My idea was to aggregate for the 3 parameters ..but I don t know  
>>>> how to
>>>> get
>>>> the numeric value (SUM,COUNT,AVG) I need.
>>>> Can you help me?
>>>> thank you
>>>>
>>>> http://www.nabble.com/file/p22905322/ordini2008_ex.txtordini2008_ex.txt
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/SUM%2CCOUNT%2CAVG-tp22905322p22905322.html
>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>
>>       [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> -- 
> Jun Shen PhD
> PK/PD Scientist
> BioPharma Services
> Millipore Corporation
> 15 Research Park Dr.
> St Charles, MO 63304
> Direct: 636-720-1589
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list