[R] how to make aggregation in R ?
Gabor Grothendieck
ggrothendieck at gmail.com
Fri Mar 20 02:37:14 CET 2009
Here are two solutions:
> aggregate(testDF[c("n1", "n2")], testDF[c("v1", "v2")], sum)
v1 v2 n1 n2
1 a a1 6 66
2 a a2 4 24
3 a a3 5 25
4 b b1 13 53
5 b b2 27 87
6 c c1 11 31
7 c c2 39 99
8 c c3 15 35
9 d d1 16 36
10 d d2 17 37
11 d d3 18 38
12 d d4 39 79
> library(sqldf)
> sqldf("select v1, v2, sum(n1), sum(n2) from testDF group by v1, v2")
v1 v2 sum(n1) sum(n2)
1 a a1 6 66
2 a a2 4 24
3 a a3 5 25
4 b b1 13 53
5 b b2 27 87
6 c c1 11 31
7 c c2 39 99
8 c c3 15 35
9 d d1 16 36
10 d d2 17 37
11 d d3 18 38
12 d d4 39 79
On Thu, Mar 19, 2009 at 9:18 PM, Ferry <fmi.mlist at gmail.com> wrote:
> Hi,
>
> I think I found the solution.
> Using doBy library, I got:
>
> testDF.result2 <- summaryBy(n1+n2 ~ v1+v2, data = testDF, FUN=sum)
>
>> testDF.result2
> v1 v2 n1.sum n2.sum
> 1 a a1 6 66
> 2 a a2 4 24
> 3 a a3 5 25
> 4 b b1 13 53
> 5 b b2 27 87
> 6 c c1 11 31
> 7 c c2 39 99
> 8 c c3 15 35
> 9 d d1 16 36
> 10 d d2 17 37
> 11 d d3 18 38
> 12 d d4 39 79
>
> In any case, did I do something wrong using the aggregate function?
>
> Thanks,
>
> Ferry
>
> On Thu, Mar 19, 2009 at 6:09 PM, Ferry <fmi.mlist at gmail.com> wrote:
>
>> Hi,
>>
>> I am trying to aggregate the sum of my test data.frame as follow:
>>
>> testDF <- data.frame(v1 = c("a", "a", "a", "a", "a", "b", "b", "b", "b",
>> "b", "c", "c", "c", "c", "c", "d", "d", "d", "d", "d"),
>> v2 = c("a1", "a1", "a1", "a2", "a3", "b1", "b1", "b2",
>> "b2", "b2", "c1", "c2", "c2", "c2", "c3", "d1", "d2", "d3", "d4", "d4"),
>> n1 = 1:20,
>> n2 = 21:40 )
>>
>> testDF <- orderBy( ~ v1+v2, data = testDF)
>> rownames(testDF) <- NULL
>>
>> > testDF
>> v1 v2 n1 n2
>> 1 a a1 1 21
>> 2 a a1 2 22
>> 3 a a1 3 23
>> 4 a a2 4 24
>> 5 a a3 5 25
>> 6 b b1 6 26
>> 7 b b1 7 27
>> 8 b b2 8 28
>> 9 b b2 9 29
>> 10 b b2 10 30
>> 11 c c1 11 31
>> 12 c c2 12 32
>> 13 c c2 13 33
>> 14 c c2 14 34
>> 15 c c3 15 35
>> 16 d d1 16 36
>> 17 d d2 17 37
>> 18 d d3 18 38
>> 19 d d4 19 39
>> 20 d d4 20 40
>> >
>>
>> testDF.result <- aggregate(list(testDF$n1, testDF$n2), by = list(testDF$v1,
>> testDF$v2), FUN = sum)
>>
>> > testDF.result
>> Group.1 Group.2 X1.20 X21.40
>> 1 a a1 6 66
>> 2 a a2 4 24
>> 3 a a3 5 25
>> 4 b b1 13 53
>> 5 b b2 27 87
>> 6 c c1 11 31
>> 7 c c2 39 99
>> 8 c c3 15 35
>> 9 d d1 16 36
>> 10 d d2 17 37
>> 11 d d3 18 38
>> 12 d d4 39 79
>> >
>>
>>
>> However, when I applied it to my real data, it failed. It seems that
>> aggregate require more memory that I have currently (I am using WinXP,
>> R2.8.0, 2GB RAM).
>>
>> Basically I want to perform aggregate sum on my numeric fields (in the
>> above case, n1 and n2) based on condition of v1 and v2.
>>
>> Problem is, I have a lot more of than just two numerics and conditioning
>> fields.
>>
>> In SQL, I would do:
>> select v1, v2, sum(n1), sum(n2) from myData
>> group by v1, v2;
>>
>> Am I using a wrong function / library (or even wrong approach)? If so, can
>> you suggest which one?
>>
>> Any pointer is really appreciated.
>>
>> Thanks,
>>
>> Ferry
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list