[R] combining dataframes with different numbers of columns

Wed Nov 8 04:24:10 CET 2006

Thanks you very much Hadley, Stephen, and Sundar, your suggestions  
all solve my problem, even if the narrower dataframe contains  
variables that are new to the wider dataframe.

I'm sure glad I took the time to write instead of pursuing with my  
ugly and time-consuming solution.

Denis

> Dear list members,
>
> I have to combine dataframes together. However they contain  
> different numbers of variables. It is possible that all the  
> variables in the dataframe with fewer variables are contained in  
> the dataframe with more variables, though it is not always the case.
>
> There are key variables identifying observations. These could be  
> used in a merge statement, although this won't quite work for me  
> (see below).
>
> I was hoping to find a way to combine dataframes where I needed  
> only to ensure the key variables were present. The total number of  
> variables in the final dataframe would be the total number of  
> different variables in both initial dataframes. Variables that were  
> absent in one dataframe would automatically get missing values in  
> the joint dataframe.
>
> Here is a simple example. The initial dataframes are a and b. All  
> variables in b are also in a.
>
> a <- data.frame(X=seq(1,10), matrix(runif(100, 0,15), ncol=10))
> b <- data.frame(X=seq(16,20), X4=runif(5,0,15))
>
> A merge does not work because the common variable X4 becomes 2  
> variables, X4.x and X4.y.
>
> c <- merge(a,b,by="X", all=T)
>
> This can be fixed but it requires several steps (although my  
> solution is probably not optimal):
>
> names(c)[5] <- "X4"
> c$X4[is.na(c$X4)] <- c$X4.y[is.na(c$X4)]
> c <- c[,1:11]
>
> One quickly becomes tired with this solution with my real-life  
> dataframes where different columns would require "repair" from one  
> case to the next.
>
> I think I still prefer making the narrower dataframe like the wider  
> one:
>
> b2 <- upData(b, X1=NA, X2=NA, X3=NA, X5=NA, X6=NA, X7=NA, X8=NA,  
> X9=NA, X10=NA)
> b2 <- b2[,c(1, 3:5, 2, 6:11)]
>
> d <- rbind(a, b2)
>
> But again this requires quite a bit of fine-tuning from one case to  
> the next in my real-life dataframes.
>
> I suspect R has a neat way to do this and I just cannot come up  
> with the proper search term to find help on my own.
>
> Or this can be automated: can one compare variable lists from 2  
> dataframes and add missing variables in the "narrower" dataframe?
>
> Ideally, the solution would be able to handle the situation where  
> the narrower dataframe contains one or more variables that are  
> absent from the wider one. If this was the case, I'd like the new  
> variable to be present in the combined dataframe, with missing  
> values given to the observations from the wider dataframe.
>
> Thanks in advance,
>
> Denis Chabot
>
>