[R] Two Problems while trying to aggregate a dataframe

Delcour Libertus delcour.libertus at gmail.com
Sat Mar 24 18:35:59 CET 2007


Hello!

Given is an Excel-Sheet with actually 11,000 rows and 9 columns. I want
to work with the data in R. The contents are similar to my following
example.

I have a list with ID-number, personal name and two kinds of
loan-values. I want to aggregate the list, that for each person only one
row remains and where the loan-values are added.

First I tried some commands with tapply but had no success at all. Then
I found in this mailing list a hint for aggregate (though I did not
understand most of that mail).

So I made some efforts with aggregate() and it seems to lead the right way:

[code]
> atest <- read.csv2 ("aggregatetest.csv")
> str(atest)
`data.frame':   10 obs. of  4 variables:
 $ PrsNr  : int  1 2 2 3 4 5 6 6 6 7
 $ Namen  : Factor w/ 7 levels "Holla","Mabba",..: 1 2 2 4 5 6 7 7 7 3
 $ Betrag1: num  1.99 2.34 5.23 4.23 2.23 2.77 3.83 2.76 6.32 2.88
 $ Betrag2: num  3.44 5.32 5.21 9.12 7.32 8.32 6.99 4.45 5.34 3.81
> atest
   PrsNr Namen Betrag1 Betrag2
1      1 Holla    1.99    3.44
2      2 Mabba    2.34    5.32
3      2 Mabba    5.23    5.21
4      3  Pisa    4.23    9.12
5      4 Pulla    2.23    7.32
6      5  Raba    2.77    8.32
7      6  Saba    3.83    6.99
8      6  Saba    2.76    4.45
9      6  Saba    6.32    5.34
10     7 Mulla    2.88    3.81
> aggregate(list(Betrag1=atest$Betrag1),  by=list(PsrNr=atest$PrsNr,
Namen=atest$Namen),  sum)
  PsrNr Namen Betrag1
1     1 Holla    1.99
2     2 Mabba    7.57
3     7 Mulla    2.88
4     3  Pisa    4.23
5     4 Pulla    2.23
6     5  Raba    2.77
7     6  Saba   12.91
[/code]

The result is nearly that I want.

First problem:

How do I get all columnss in my result. "Betrag2" is missing.

Second problem:

If I use the aggregate-command on the real data then it is for me
impossible to use more than on by-grouping variable (my example above
has two). Impossible because 1 GB RAM and 1.5 GB SWAP are not enough to
process my command. My computer (Ubuntu Linux, Gmome) freezes. So I
doubt wether I use the appropriate method to follow my target.

Which ist the best way to aggregate dataframes as I want? Are there any
better functions/commands or do I have to learn programming for this?

Greetings

Delcour



More information about the R-help mailing list