Wed May 5 23:32:42 CEST 2010
Extending my question further, I want to apply different FUN arguments
on three fields and the "by" argument also contains more than one field.
For example:
set.seed(100)
d =
data.frame(a=sample(letters[1:2],20,replace=T),b=sample(3,20,replace=T),c=rpois(20,1),d=rbinom(20,1,0.5),e=rep(c("X","Y"),10))
Now I want to split by fields "a" and "b", and want to calculate
mean(c), sum(d) and "X"%in%e.
Is there any function which can do this and return the output in a
dataframe format. For the above example, it should ideally be a 6*5
dataframe.
Thanks in advance.
Regards,
Utkarsh Singhal
On 11/23/2009 5:14 AM, Gabor Grothendieck wrote:
> Try this:
>
>
>> library(doBy)
>> summaryBy(breaks ~ ., warpbreaks, FUN = c(mean, sum, length))
>>
> wool tension breaks.mean breaks.sum breaks.length
> 1 A L 44.55556 401 9
> 2 A M 24.00000 216 9
> 3 A H 24.55556 221 9
> 4 B L 28.22222 254 9
> 5 B M 28.77778 259 9
> 6 B H 18.77778 169 9
>
> On Mon, Nov 23, 2009 at 3:15 AM, utkarshsinghal
> <utkarsh.singhal at global-analytics.com> wrote:
>
>> Hi All,
>>
>> I am currently doing the following to compute summary statistics of
>> aggregated data:
>> a = aggregate(warpbreaks$breaks, warpbreaks[,-1], mean)
>> b = aggregate(warpbreaks$breaks, warpbreaks[,-1], sum)
>> c = aggregate(warpbreaks$breaks, warpbreaks[,-1], length)
>> ans = cbind(a, b[,3], c[,3])
>>
>> This seems unnecessarily complex to me so I tried
>>
>>> aggregate(warpbreaks$breaks, warpbreaks[,-1], function(z)
>>> c(mean(z),sum(z),length(z)))
>>>
>> but aggregate doesn't allow FUN argument to return a vector.
>>
>> I tried "by", "tapply" and several other functions as well but the output
>> needed further modifications to get the same format as "ans" above.
>>
>> Is there any other function same as aggregate which allow FUN argument to
>> return vector.
>>
>> Regards
>> Utkarsh
>>
