[R] FUN argument to return a vector in aggregate function
David Winsemius
dwinsemius at comcast.net
Wed May 5 23:56:08 CEST 2010
On May 5, 2010, at 5:32 PM, utkarshsinghal wrote:
> Extending my question further, I want to apply different FUN
> arguments on three fields and the "by" argument also contains more
> than one field.
> For example:
> set.seed(100)
> d = data.frame(a=sample(letters[1:2],
> 20
> ,replace
> =
> T
> ),b
> =
> sample
> (3,20,replace=T),c=rpois(20,1),d=rbinom(20,1,0.5),e=rep(c("X","Y"),
> 10))
>
> Now I want to split by fields "a" and "b", and want to calculate
> mean(c), sum(d) and "X"%in%e.
>
> Is there any function which can do this and return the output in a
> dataframe format. For the above example, it should ideally be a 6*5
> dataframe.
The split function is often used for such purposes.
?split
> lapply(split(d$c, list(d$a,d$b)), mean)
$a.1
[1] 0.3333333
$b.1
[1] 0
$a.2
[1] 0.25
$b.2
[1] 0.6666667
$a.3
[1] 1.4
$b.3
[1] 0.75
Your third requested function is not a scalar so that might pose
problems:
> lapply(split(d$e, list(d$a,d$b)), function(x) { x %in% "X"})
$a.1
[1] FALSE FALSE TRUE
$b.1
[1] FALSE
$a.2
[1] TRUE TRUE TRUE FALSE
$b.2
[1] TRUE TRUE TRUE
$a.3
[1] FALSE FALSE FALSE FALSE TRUE
$b.3
[1] TRUE FALSE TRUE FALSE
I believe the summaryBy function in the doBy package might be helpful.
You might also consider some of the "describe" functions in various
package, Hmisc being the one I have familiarity with. Output will
probably be a list, but if it has a regular structure, the
as.data.frame function may be effective.
>
> Thanks in advance.
>
> Regards,
> Utkarsh Singhal
>
>
>
> On 11/23/2009 5:14 AM, Gabor Grothendieck wrote:
>> Try this:
>>
>>
>>> library(doBy)
>>> summaryBy(breaks ~ ., warpbreaks, FUN = c(mean, sum, length))
>>>
>> wool tension breaks.mean breaks.sum breaks.length
>> 1 A L 44.55556 401 9
>> 2 A M 24.00000 216 9
>> 3 A H 24.55556 221 9
>> 4 B L 28.22222 254 9
>> 5 B M 28.77778 259 9
>> 6 B H 18.77778 169 9
>>
>> On Mon, Nov 23, 2009 at 3:15 AM, utkarshsinghal
>> <utkarsh.singhal at global-analytics.com> wrote:
>>
>>> Hi All,
>>>
>>> I am currently doing the following to compute summary statistics of
>>> aggregated data:
>>> a = aggregate(warpbreaks$breaks, warpbreaks[,-1], mean)
>>> b = aggregate(warpbreaks$breaks, warpbreaks[,-1], sum)
>>> c = aggregate(warpbreaks$breaks, warpbreaks[,-1], length)
>>> ans = cbind(a, b[,3], c[,3])
>>>
>>> This seems unnecessarily complex to me so I tried
>>>
>>>> aggregate(warpbreaks$breaks, warpbreaks[,-1], function(z)
>>>> c(mean(z),sum(z),length(z)))
>>>>
>>> but aggregate doesn't allow FUN argument to return a vector.
>>>
>>> I tried "by", "tapply" and several other functions as well but the
>>> output
>>> needed further modifications to get the same format as "ans" above.
>>>
>>> Is there any other function same as aggregate which allow FUN
>>> argument to
>>> return vector.
>>>
>>> Regards
>>> Utkarsh
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list