[R] SUM,COUNT,AVG
David Winsemius
dwinsemius at comcast.net
Tue Apr 7 05:49:15 CEST 2009
On Apr 6, 2009, at 6:31 PM, Jun Shen wrote:
> This is a good example to compare different approaches. My
> understanding is
>
> aggregate() can apply one function to multiple columns
> summarize() can apply multiple functions to one column
> I am not sure if ddply() can actually apply multiple functions to
> multiple
> columns? This is what I would like to do. The syntax in the help is a
> little confusing to me. Appreciate more comments. Thanks
This looks reasonably straight-forward:
lapply(c(mean,sd, length), function(func)
{aggregate(state.x77, list(Region =
state.region), func)} )
--
David Winsemius
>
>
> Jun
>
> On Mon, Apr 6, 2009 at 9:51 AM, Stavros Macrakis <macrakis at alum.mit.edu
> >wrote:
>
>> Actually, ddply does this perfectly ... I had made a mistake in using
>> 'each'. The correct code is:
>>
>> ddply(dd,~b+c,function(x)each(count=length,sum=sum,avg=mean)(x$a))
>>
>> b c count sum avg
>> 1 1 1 2 10 5.000000
>> 2 2 1 1 3 3.000000
>> 3 3 1 1 10 10.000000
>> 4 1 2 2 10 5.000000
>> 5 1 3 1 5 5.000000
>> 6 3 3 3 17 5.666667
>>
>> Hope this helps,
>>
>> -s
>>
>>
>>
>> On Mon, Apr 6, 2009 at 10:34 AM, Stavros Macrakis <macrakis at alum.mit.edu
>>> wrote:
>>
>>> There are various ways to do this in R.
>>>
>>> # sample data
>>> dd <-
>> data.frame(a=1:10,b=sample(3,10,replace=T),c=sample(3,10,replace=T))
>>>
>>> Using the standard built-in functions, you can use:
>>>
>>> *** aggregate ***
>>>
>>> aggregate(dd,list(b=dd$b,c=dd$c),sum)
>>> b c a b c
>>> 1 1 1 10 2 2
>>> 2 2 1 3 2 1
>>> ....
>>>
>>> *** tapply ***
>>>
>>> tapply(dd$a,interaction(dd$b,dd$c),sum)
>>> 1.1 2.1 3.1 1.2 2.2 3.2 1.3
>>> 2.3
>>> 5.000000 3.000000 10.000000 5.000000 NA NA 5.000000
>>> ...
>>>
>>> But the nicest way is probably to use the plyr package:
>>>
>>>> library(plyr)
>>>> ddply(dd,~b+c,sum)
>>> b c V1
>>> 1 1 1 14
>>> 2 2 1 6
>>> ....
>>>
>>> ********
>>>
>>> Unfortunately, none of these approaches allows you do return more
>>> than
>> one
>>> result from the function, so you'll need to write
>>>
>>>> ddply(dd,~b+c,length) # count
>>>> ddply(dd,~b+c,sum)
>>>> ddply(dd,~b+c,mean) # arithmetic average
>>>
>>> There is an 'each' function in plyr, but it doesn't seem to be
>>> compatible
>>> with ddply.
>>>
>>> -s
>>>
>>> On Mon, Apr 6, 2009 at 5:37 AM, calpeda <mauro.biasolo at calpeda.it>
>> wrote:
>>>
>>>>
>>>> Hi,
>>>> I ve been searching a lot in internet..but I can t find a solution
>>>> Attached, you find a file.
>>>> I need for each (Materiale, tpdv, UM) to find sum,avg and count
>>>> My idea was to aggregate for the 3 parameters ..but I don t know
>>>> how to
>>>> get
>>>> the numeric value (SUM,COUNT,AVG) I need.
>>>> Can you help me?
>>>> thank you
>>>>
>>>> http://www.nabble.com/file/p22905322/ordini2008_ex.txtordini2008_ex.txt
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/SUM%2CCOUNT%2CAVG-tp22905322p22905322.html
>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jun Shen PhD
> PK/PD Scientist
> BioPharma Services
> Millipore Corporation
> 15 Research Park Dr.
> St Charles, MO 63304
> Direct: 636-720-1589
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list