[R] FUN argument to return a vector in aggregate function

David Winsemius dwinsemius at comcast.net
Wed May 5 23:56:08 CEST 2010


On May 5, 2010, at 5:32 PM, utkarshsinghal wrote:

> Extending my question further, I want to apply different FUN  
> arguments on three fields and the "by" argument also contains more  
> than one field.
> For example:
> set.seed(100)
> d = data.frame(a=sample(letters[1:2], 
> 20 
> ,replace 
> = 
> T 
> ),b 
> = 
> sample 
> (3,20,replace=T),c=rpois(20,1),d=rbinom(20,1,0.5),e=rep(c("X","Y"), 
> 10))
>
> Now I want to split by fields "a" and "b", and want to calculate  
> mean(c), sum(d) and "X"%in%e.
>
> Is there any function which can do this and return the output in a  
> dataframe format. For the above example, it should ideally be a 6*5  
> dataframe.

The split function is often used for such purposes.

?split

 > lapply(split(d$c, list(d$a,d$b)), mean)
$a.1
[1] 0.3333333

$b.1
[1] 0

$a.2
[1] 0.25

$b.2
[1] 0.6666667

$a.3
[1] 1.4

$b.3
[1] 0.75

Your third requested function is not a scalar so that might pose  
problems:

 > lapply(split(d$e, list(d$a,d$b)), function(x) { x %in% "X"})
$a.1
[1] FALSE FALSE  TRUE

$b.1
[1] FALSE

$a.2
[1]  TRUE  TRUE  TRUE FALSE

$b.2
[1] TRUE TRUE TRUE

$a.3
[1] FALSE FALSE FALSE FALSE  TRUE

$b.3
[1]  TRUE FALSE  TRUE FALSE

I believe the summaryBy function in the doBy package might be helpful.  
You might also consider some of the "describe" functions in various  
package, Hmisc being the one I have familiarity with. Output will  
probably be a list, but if it has a regular structure, the  
as.data.frame function may be effective.

>
> Thanks in advance.
>
> Regards,
> Utkarsh Singhal
>
>
>
> On 11/23/2009 5:14 AM, Gabor Grothendieck wrote:
>> Try this:
>>
>>
>>> library(doBy)
>>> summaryBy(breaks ~ ., warpbreaks, FUN = c(mean, sum, length))
>>>
>>   wool tension breaks.mean breaks.sum breaks.length
>> 1    A       L    44.55556        401             9
>> 2    A       M    24.00000        216             9
>> 3    A       H    24.55556        221             9
>> 4    B       L    28.22222        254             9
>> 5    B       M    28.77778        259             9
>> 6    B       H    18.77778        169             9
>>
>> On Mon, Nov 23, 2009 at 3:15 AM, utkarshsinghal
>> <utkarsh.singhal at global-analytics.com>  wrote:
>>
>>> Hi All,
>>>
>>> I am currently doing the following to compute summary statistics of
>>> aggregated data:
>>> a = aggregate(warpbreaks$breaks, warpbreaks[,-1], mean)
>>> b = aggregate(warpbreaks$breaks, warpbreaks[,-1], sum)
>>> c = aggregate(warpbreaks$breaks, warpbreaks[,-1], length)
>>> ans = cbind(a, b[,3], c[,3])
>>>
>>> This seems unnecessarily complex to me so I tried
>>>
>>>> aggregate(warpbreaks$breaks, warpbreaks[,-1], function(z)
>>>> c(mean(z),sum(z),length(z)))
>>>>
>>> but aggregate doesn't allow FUN argument to return a vector.
>>>
>>> I tried "by", "tapply" and several other functions as well but the  
>>> output
>>> needed further modifications to get the same format as "ans" above.
>>>
>>> Is there any other function same as aggregate which allow FUN  
>>> argument to
>>> return vector.
>>>
>>> Regards
>>> Utkarsh
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list