[R] Two Problems while trying to aggregate a dataframe
Gabor Grothendieck
ggrothendieck at gmail.com
Sat Mar 24 19:17:05 CET 2007
Try this:
aggregate(atest[3:4], atest[1:2], sum)
Use a data base and SQL is you don't otherwise have enough
computer resources.
On 3/24/07, Delcour Libertus <delcour.libertus at gmail.com> wrote:
> Hello!
>
> Given is an Excel-Sheet with actually 11,000 rows and 9 columns. I want
> to work with the data in R. The contents are similar to my following
> example.
>
> I have a list with ID-number, personal name and two kinds of
> loan-values. I want to aggregate the list, that for each person only one
> row remains and where the loan-values are added.
>
> First I tried some commands with tapply but had no success at all. Then
> I found in this mailing list a hint for aggregate (though I did not
> understand most of that mail).
>
> So I made some efforts with aggregate() and it seems to lead the right way:
>
> [code]
> > atest <- read.csv2 ("aggregatetest.csv")
> > str(atest)
> `data.frame': 10 obs. of 4 variables:
> $ PrsNr : int 1 2 2 3 4 5 6 6 6 7
> $ Namen : Factor w/ 7 levels "Holla","Mabba",..: 1 2 2 4 5 6 7 7 7 3
> $ Betrag1: num 1.99 2.34 5.23 4.23 2.23 2.77 3.83 2.76 6.32 2.88
> $ Betrag2: num 3.44 5.32 5.21 9.12 7.32 8.32 6.99 4.45 5.34 3.81
> > atest
> PrsNr Namen Betrag1 Betrag2
> 1 1 Holla 1.99 3.44
> 2 2 Mabba 2.34 5.32
> 3 2 Mabba 5.23 5.21
> 4 3 Pisa 4.23 9.12
> 5 4 Pulla 2.23 7.32
> 6 5 Raba 2.77 8.32
> 7 6 Saba 3.83 6.99
> 8 6 Saba 2.76 4.45
> 9 6 Saba 6.32 5.34
> 10 7 Mulla 2.88 3.81
> > aggregate(list(Betrag1=atest$Betrag1), by=list(PsrNr=atest$PrsNr,
> Namen=atest$Namen), sum)
> PsrNr Namen Betrag1
> 1 1 Holla 1.99
> 2 2 Mabba 7.57
> 3 7 Mulla 2.88
> 4 3 Pisa 4.23
> 5 4 Pulla 2.23
> 6 5 Raba 2.77
> 7 6 Saba 12.91
> [/code]
>
> The result is nearly that I want.
>
> First problem:
>
> How do I get all columnss in my result. "Betrag2" is missing.
>
> Second problem:
>
> If I use the aggregate-command on the real data then it is for me
> impossible to use more than on by-grouping variable (my example above
> has two). Impossible because 1 GB RAM and 1.5 GB SWAP are not enough to
> process my command. My computer (Ubuntu Linux, Gmome) freezes. So I
> doubt wether I use the appropriate method to follow my target.
>
> Which ist the best way to aggregate dataframes as I want? Are there any
> better functions/commands or do I have to learn programming for this?
>
> Greetings
>
> Delcour
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list