[R] merging data.frames of different length

Don MacQueen macq at llnl.gov
Thu Jun 18 17:39:58 CEST 2009


The word "merge" in the context of R suggests the use of the merge() 
function, but I don't think that's the right tool for what you want. 
The merge() function is for relational database type merges, which 
for your data would have a many to many merge. Not good.

In terms of the R language, you're looking for something using the 
cbind() function, not the merge() function (I think).

There are a couple of details that need to be clarified, and my 
solution below made some assumptions.

1) Could a value in the first column appear in only one of the two data frames?

2) Is it always x1 that has more values (in your example, x1 had the 
number 1 appear three times in the first column, and x2 had it appear 
only twice. Does x2 sometimes have more rows? (I think your 
description implies that, but it's good to be explicit)

I added extra rows to your example data frames to test my assumptions 
about the answers.


After trying to be clever, I decided the easiest way is brute force.

Hopefully, this is what you want:

x1 <- as.data.frame( matrix(
c(
1,	4,
1,	3,
1,	6,
2,	9,
2,	2,
2,	5,
3,	6,
3,	7,
3,	4,
   4,0,
   4,1) , byrow=TRUE,ncol=2))

x2 <- as.data.frame( matrix(
c(
1,	-3,
1,	-7,
2,	-3,
2,	-2,
2,	-8,
3,	-1,
3,	-2,
3,	-1,
   4,0,
   4,1,
   4,2,
   4,3) , byrow=TRUE,ncol=2))

###
ivals <- sort(unique(c(x1$V1,x2$V1)))

for (i in ivals) {
   tmpx1 <- x1[x1$V1 == i , ]
   tmpx2 <- x2[x2$V1 == i , ]
   n.to.use <- min( nrow(tmpx1), nrow(tmpx2))
   if (n.to.use >= 1 ) {
     rtmp <- seq(n.to.use)
     tmpnew <- cbind( tmpx1[rtmp, ], V3=tmpx2[rtmp,'V2'])
     if (i==min(ivals)) {
       newx <- tmpnew
     } else {
       newx <- rbind( newx, tmpnew)
     }
   } else next
}


The loop could be written with fewer lines of code, but I found it 
easier to read and understand this way.
If x1 and x2 have a very large number of rows, the above should 
probably be revised for better memory usage.

-Don

At 2:33 AM +0200 6/18/09, Martin Batholdy wrote:
>hi,
>
>
>I have two data.frames each with two columns;
>
>x1
>
>1	4
>1	3
>1	6
>2	9
>2	2
>2	5
>3	6
>3	7
>3	4
>
>
>x2
>
>1	-3
>1	-7
>2	-3
>2	-2
>2	-8
>3	-1
>3	-2
>3	-1
>
>now I want to merge this data.frames to one data.frame.
>
>The problem is, that sometimes there is a different number of 
>elements per category.
>(like above x1 has 3 values for the value 1 in the first row, but x2 
>has only 2 values for the value 1 in the first row).
>
>Is there an easy way to merge this two data.frames by deleting the 
>rows that only one data.frame "has".
>In the example, that resulting data.frame would be the data.frame x1 
>and x2 except the row 3 of data.frame x1.
>
>thanks for any suggestions!
>
>______________________________________________
>R-help at r-project.org mailing list
>https:// stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http:// www. R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.


-- 
--------------------------------------
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062




More information about the R-help mailing list