[R] Contatenating data frames with partial overlap in variable names
Daniel Folkinshteyn
dfolkins at temple.edu
Sun Mar 25 04:16:29 CEST 2007
on 03/24/2007 10:00 PM Marc Schwartz said the following:
> On Sat, 2007-03-24 at 21:47 -0400, Daniel Folkinshteyn wrote:
>> Greetings to all.
>> I need to concatenate data frames that do not have all the same variable
>> names, there is only a partial overlap in the variables. So, for
>> example, if i have two data frames, a and b, that look like the following:
>>> a
>> a b
>> 1 1 4
>> 2 2 5
>> 3 3 6
>> 4 4 7
>> 5 5 8
>>> b
>> c a
>> 1 1 10
>> 2 2 11
>> 3 3 12
>> 4 4 13
>> 5 5 14
>>
>> i want to concatenate them by row, without any matching, so that the
>> variables that are not available in all frames get NAs. The result
>> should look like:
>>
>> a b c
>> 1 1 4 NA
>> 2 2 5 NA
>> 3 3 6 NA
>> 4 4 7 NA
>> 5 5 8 NA
>> 6 10 NA 1
>> 7 11 NA 2
>> 8 12 NA 3
>> 9 13 NA 4
>> 10 14 NA 5
>>
>> rbind doesn't work, since it requires all variables to be matched
>> between the two data frames. merge doesn't work, since it wants to
>> /match/ by columns with the same name, and if matching by nothing,
>> produces a cartesian product.
>>
>> is there a neat trick for doing this simply, or am i stuck with
>> comparing variable lists and generating NAs manually?
>>
>> would appreciate any help!
>> Daniel
>
> You can use merge():
>
>> a
> a b
> 1 1 4
> 2 2 5
> 3 3 6
> 4 4 7
> 5 5 8
>
>> b
> c a
> 1 1 10
> 2 2 11
> 3 3 12
> 4 4 13
> 5 5 14
>
>
> Use 'a' as the common 'by' column and specify 'all = TRUE' so that
> non-matching values of 'a' will be included in the result:
>
>
>> merge(a, b, by = "a", all = TRUE)
> a b c
> 1 1 4 NA
> 2 2 5 NA
> 3 3 6 NA
> 4 4 7 NA
> 5 5 8 NA
> 6 10 NA 1
> 7 11 NA 2
> 8 12 NA 3
> 9 13 NA 4
> 10 14 NA 5
>
Thanks for your quick response. Unfortunately, this is still not quite
what I have in mind (though maybe it's my fault for not making this too
clear). Even if the two data frames happen to have some values of 'a'
that match, I still want those records to remain separate, rather than
merge. So, for instance, using merge will produce the following:
> a = data.frame(a=1:5, b=4:8)
> a
a b
1 1 4
2 2 5
3 3 6
4 4 7
5 5 8
> b = data.frame(c=1:5, a=4:8)
> b
c a
1 1 4
2 2 5
3 3 6
4 4 7
5 5 8
> merge(a,b,by='a',all=T)
a b c
1 1 4 NA
2 2 5 NA
3 3 6 NA
4 4 7 1
5 5 8 2
6 6 NA 3
7 7 NA 4
8 8 NA 5
whereas I would still want it to produce 10 separate rows, because they
are separate observations, it's just that one of them happens to be
missing a variable.
Thanks,
Daniel
More information about the R-help
mailing list