[R] merging with aggregating

Dubravko Dolic Dubravko.Dolic at komdat.com
Tue Dec 6 15:19:20 CET 2005


Hi all,

the moment you hit the 'send' button you know the answer...

I approached a solution similar to this one given by Marc. But maybe there is a better one? Even because this operation is done in a for-loop during which R gets new data from a database. So I sum up 16 data.frames eventually.

Dubro


-----Ursprüngliche Nachricht-----
Von: Marc Schwartz [mailto:MSchwartz at mn.rr.com] 
Gesendet: Dienstag, 6. Dezember 2005 15:11
An: Dubravko Dolic
Cc: r-help at stat.math.ethz.ch
Betreff: Re: [R] merging with aggregating

On Tue, 2005-12-06 at 14:22 +0100, Dubravko Dolic wrote:
> Dear List,
> 
> I have two data.frame of the following form:
> 
> A:
> 
> n  V1 V2
> 1  12  0 
> 2  10  8
> 3   3  8 
> 4   8  4
> 6   7  3  
> 7  12  0 
> 8   1  0 
> 9  18  0 
> 10  1  0
> 13  2  0
> 
> B:
> 
> n  V1 V2
> 1   0  2
> 2   0  3
> 3   1  9
> 4  12  8 
> 5   2  9
> 6   2  9
> 8   2  0
> 10  4  1
> 11  7  1
> 12  0  1
> 
> 
> Now I want to merge those frame to one data.frame with summing up the
> columns V1 and V2 but not the column n. So the result in this example
> would be:
> 
> AB:
> 
> n  V1 V2
> 1  12  2
> 2  10 11 
> 3   4 17
> 4  20 12
> 5   2  9
> 6   9 12
> 7  12  0
> 8   3  0
> 9  18  0
> 10  5  1
> 11  7  1
> 12  0  1
> 13  2  0 
> 
> 
> So Columns V1 and V2 are the sum of A und B while n has its old value.
> Notice that there are different rows in n of A and B.
> 
> I don't have a clue how to start here. Any hint is welcome.
> 
> Thanks

There might be a somewhat easier way, but here is one approach:

# Use merge() to join A and B on 'n'
# Set all = TRUE to include non-matched rows

> C <- merge(A, B, by = "n", all = TRUE)

> C
    n V1.x V2.x V1.y V2.y
1   1   12    0    0    2
2   2   10    8    0    3
3   3    3    8    1    9
4   4    8    4   12    8
5   5   NA   NA    2    9
6   6    7    3    2    9
7   7   12    0   NA   NA
8   8    1    0    2    0
9   9   18    0   NA   NA
10 10    1    0    4    1
11 11   NA   NA    7    1
12 12   NA   NA    0    1
13 13    2    0   NA   NA


# Now get the rowSums() for the V1/V2 column pairs
# and create a new dataframe from the 
# results

> AB <- data.frame(n = C$n, 
                   V1 = rowSums(C[, c(2, 4)], na.rm = TRUE), 
                   V2 = rowSums(C[, c(3, 5)], na.rm = TRUE))


> AB
    n V1 V2
1   1 12  2
2   2 10 11
3   3  4 17
4   4 20 12
5   5  2  9
6   6  9 12
7   7 12  0
8   8  3  0
9   9 18  0
10 10  5  1
11 11  7  1
12 12  0  1
13 13  2  0


See ?merge and ?rowSums for more information.

HTH,

Marc Schwartz




More information about the R-help mailing list