[R] FUN argument to return a vector in aggregate function
Gabor Grothendieck
ggrothendieck at gmail.com
Wed May 5 23:50:09 CEST 2010
Try this:
do.call("rbind", by(d, d[1:2], function(x) with(x, data.frame(x[1,
1:2], `mean c` = mean(c), `sum d` = sum(d), `has X` = "X" %in% e,
check.names = FALSE))))
or this (which uses 1 or 0 to mean TRUE or FALSE in the last column):
> library(sqldf) # see http://sqldf.googlecode.com
> sqldf("select a, b, avg(c) 'mean c', sum(d) 'sum d', sum(e = 'X')>0 'has X' from d group by a, b", method = "raw")
a b mean c sum d has X
1 a 1 0.3333333 2 1
2 a 2 0.2500000 2 1
3 a 3 1.4000000 4 1
4 b 1 0.0000000 0 0
5 b 2 0.6666667 1 1
6 b 3 0.7500000 2 1
or this:
do.call("rbind", by(d, d[1:2], function(x) with(x, data.frame(x[1:2],
`mean c` = mean(c), `sum d` = sum(d), `has X` = X %in% e))
On Wed, May 5, 2010 at 5:32 PM, utkarshsinghal
<utkarsh.singhal at global-analytics.com> wrote:
> Extending my question further, I want to apply different FUN arguments on
> three fields and the "by" argument also contains more than one field.
> For example:
> set.seed(100)
> d =
> data.frame(a=sample(letters[1:2],20,replace=T),b=sample(3,20,replace=T),c=rpois(20,1),d=rbinom(20,1,0.5),e=rep(c("X","Y"),10))
>
> Now I want to split by fields "a" and "b", and want to calculate mean(c),
> sum(d) and "X"%in%e.
>
> Is there any function which can do this and return the output in a dataframe
> format. For the above example, it should ideally be a 6*5 dataframe.
>
> Thanks in advance.
>
> Regards,
> Utkarsh Singhal
>
>
>
> On 11/23/2009 5:14 AM, Gabor Grothendieck wrote:
>>
>> Try this:
>>
>>
>>>
>>> library(doBy)
>>> summaryBy(breaks ~ ., warpbreaks, FUN = c(mean, sum, length))
>>>
>>
>> wool tension breaks.mean breaks.sum breaks.length
>> 1 A L 44.55556 401 9
>> 2 A M 24.00000 216 9
>> 3 A H 24.55556 221 9
>> 4 B L 28.22222 254 9
>> 5 B M 28.77778 259 9
>> 6 B H 18.77778 169 9
>>
>> On Mon, Nov 23, 2009 at 3:15 AM, utkarshsinghal
>> <utkarsh.singhal at global-analytics.com> wrote:
>>
>>>
>>> Hi All,
>>>
>>> I am currently doing the following to compute summary statistics of
>>> aggregated data:
>>> a = aggregate(warpbreaks$breaks, warpbreaks[,-1], mean)
>>> b = aggregate(warpbreaks$breaks, warpbreaks[,-1], sum)
>>> c = aggregate(warpbreaks$breaks, warpbreaks[,-1], length)
>>> ans = cbind(a, b[,3], c[,3])
>>>
>>> This seems unnecessarily complex to me so I tried
>>>
>>>>
>>>> aggregate(warpbreaks$breaks, warpbreaks[,-1], function(z)
>>>> c(mean(z),sum(z),length(z)))
>>>>
>>>
>>> but aggregate doesn't allow FUN argument to return a vector.
>>>
>>> I tried "by", "tapply" and several other functions as well but the output
>>> needed further modifications to get the same format as "ans" above.
>>>
>>> Is there any other function same as aggregate which allow FUN argument to
>>> return vector.
>>>
>>> Regards
>>> Utkarsh
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>>
>
More information about the R-help
mailing list