[R] hwo to speed up "aggregate"
analyst41 at hotmail.com
analyst41 at hotmail.com
Wed Jan 26 11:39:37 CET 2011
I have
> df
quantity branch client date name
1 10 1 1 2010-01-01 one
2 20 2 1 2010-01-01 one
3 30 3 2 2010-01-01 two
4 15 4 1 2010-01-01 one
5 10 5 2 2010-01-01 two
6 20 6 3 2010-01-01 three
7 1000 1 1 2011-01-01 one
8 2000 2 1 2011-01-01 one
9 3000 3 2 2011-01-01 two
10 1500 4 1 2011-01-01 one
11 1000 5 2 2011-01-01 two
12 2000 6 3 2011-01-01 three
I want to aggregate away the branch. I followed a suggestion by Gabor
(thanks) and did
> aggregate(list(quantity=df$quantity),list(client=df$client,date=df$date),sum)
client date quantity
1 1 2010-01-01 45
2 2 2010-01-01 40
3 3 2010-01-01 20
4 1 2011-01-01 4500
5 2 2011-01-01 4000
6 3 2011-01-01 2000
I want df$name also in the output and did what looked obvious:
> aggregate(list(quantity=df$quantity),list(client=df$client,date=df$date,name=df$name),sum)
client date name quantity
1 1 2010-01-01 one 45
2 1 2011-01-01 one 4500
3 3 2010-01-01 three 20
4 3 2011-01-01 three 2000
5 2 2010-01-01 two 40
6 2 2011-01-01 two 4000
It seems to work, but slows down tremendously for a dataframe with
around a 1000 rows.
Could anyone explain what is going on and suggest a way out?
Thanks.
More information about the R-help
mailing list