[R] combining dataframes with different numbers of columns

Denis Chabot chabotd at globetrotter.net
Wed Nov 8 03:30:08 CET 2006


Dear list members,

I have to combine dataframes together. However they contain different  
numbers of variables. It is possible that all the variables in the  
dataframe with fewer variables are contained in the dataframe with  
more variables, though it is not always the case.

There are key variables identifying observations. These could be used  
in a merge statement, although this won't quite work for me (see below).

I was hoping to find a way to combine dataframes where I needed only  
to ensure the key variables were present. The total number of  
variables in the final dataframe would be the total number of  
different variables in both initial dataframes. Variables that were  
absent in one dataframe would automatically get missing values in the  
joint dataframe.

Here is a simple example. The initial dataframes are a and b. All  
variables in b are also in a.

a <- data.frame(X=seq(1,10), matrix(runif(100, 0,15), ncol=10))
b <- data.frame(X=seq(16,20), X4=runif(5,0,15))

A merge does not work because the common variable X4 becomes 2  
variables, X4.x and X4.y.

c <- merge(a,b,by="X", all=T)

This can be fixed but it requires several steps (although my solution  
is probably not optimal):

names(c)[5] <- "X4"
c$X4[is.na(c$X4)] <- c$X4.y[is.na(c$X4)]
c <- c[,1:11]

One quickly becomes tired with this solution with my real-life  
dataframes where different columns would require "repair" from one  
case to the next.

I think I still prefer making the narrower dataframe like the wider one:

b2 <- upData(b, X1=NA, X2=NA, X3=NA, X5=NA, X6=NA, X7=NA, X8=NA,  
X9=NA, X10=NA)
b2 <- b2[,c(1, 3:5, 2, 6:11)]

d <- rbind(a, b2)

But again this requires quite a bit of fine-tuning from one case to  
the next in my real-life dataframes.

I suspect R has a neat way to do this and I just cannot come up with  
the proper search term to find help on my own.

Or this can be automated: can one compare variable lists from 2  
dataframes and add missing variables in the "narrower" dataframe?

Ideally, the solution would be able to handle the situation where the  
narrower dataframe contains one or more variables that are absent  
from the wider one. If this was the case, I'd like the new variable  
to be present in the combined dataframe, with missing values given to  
the observations from the wider dataframe.

Thanks in advance,

Denis Chabot



More information about the R-help mailing list