[R] Contatenating data frames with partial overlap in variable names

Sun Mar 25 04:16:29 CEST 2007

on 03/24/2007 10:00 PM Marc Schwartz said the following:
> On Sat, 2007-03-24 at 21:47 -0400, Daniel Folkinshteyn wrote:
>> Greetings to all.
>> I need to concatenate data frames that do not have all the same variable
>> names, there is only a partial overlap in the variables. So, for
>> example, if i have two data frames, a and b, that look like the following:
>>> a
>>   a b
>> 1 1 4
>> 2 2 5
>> 3 3 6
>> 4 4 7
>> 5 5 8
>>> b
>>   c  a
>> 1 1 10
>> 2 2 11
>> 3 3 12
>> 4 4 13
>> 5 5 14
>>
>> i want to concatenate them by row, without any matching, so that the
>> variables that are not available in all frames get NAs. The result
>> should look like:
>>
>>    a  b  c
>> 1  1  4  NA
>> 2  2  5  NA
>> 3  3  6  NA
>> 4  4  7  NA
>> 5  5  8  NA
>> 6  10 NA 1
>> 7  11 NA 2
>> 8  12 NA 3
>> 9  13 NA 4
>> 10 14 NA 5
>>
>> rbind doesn't work, since it requires all variables to be matched
>> between the two data frames. merge doesn't work, since it wants to
>> /match/ by columns with the same name, and if matching by nothing,
>> produces a cartesian product.
>>
>> is there a neat trick for doing this simply, or am i stuck with
>> comparing variable lists and generating NAs manually?
>>
>> would appreciate any help!
>> Daniel
> 
> You can use merge():
> 
>> a
>   a b
> 1 1 4
> 2 2 5
> 3 3 6
> 4 4 7
> 5 5 8
> 
>> b
>   c  a
> 1 1 10
> 2 2 11
> 3 3 12
> 4 4 13
> 5 5 14
> 
> 
> Use 'a' as the common 'by' column and specify 'all = TRUE' so that
> non-matching values of 'a' will be included in the result:
> 
> 
>> merge(a, b, by = "a", all = TRUE)
>     a  b  c
> 1   1  4 NA
> 2   2  5 NA
> 3   3  6 NA
> 4   4  7 NA
> 5   5  8 NA
> 6  10 NA  1
> 7  11 NA  2
> 8  12 NA  3
> 9  13 NA  4
> 10 14 NA  5
> 
Thanks for your quick response. Unfortunately, this is still not quite
what I have in mind (though maybe it's my fault for not making this too
clear). Even if the two data frames happen to have some values of 'a'
that match, I still want those records to remain separate, rather than
merge. So, for instance, using merge will produce the following:
> a = data.frame(a=1:5, b=4:8)
> a
  a b
1 1 4
2 2 5
3 3 6
4 4 7
5 5 8
> b = data.frame(c=1:5, a=4:8)
> b
  c a
1 1 4
2 2 5
3 3 6
4 4 7
5 5 8
> merge(a,b,by='a',all=T)
  a  b  c
1 1  4 NA
2 2  5 NA
3 3  6 NA
4 4  7  1
5 5  8  2
6 6 NA  3
7 7 NA  4
8 8 NA  5

whereas I would still want it to produce 10 separate rows, because they
are separate observations, it's just that one of them happens to be
missing a variable.

Thanks,
Daniel