[R] merging or joining 2 dataframes: merge, rbind.fill, etc.?

David Kulp dkulp at fiksu.com
Wed Feb 27 04:05:40 CET 2013


On Feb 26, 2013, at 9:33 PM, Anika Masters <anika.masters at gmail.com> wrote:

> Thanks Arun and David.  Another issue I am running into are memory
> issues when one of the data frames I'm trying to rbind to or merge
> with are "very large".  (This is a repetitive  problem, as I am trying
> to merge/rbind thousands of small dataframes into a single "very
> large" dataframe.)
>
>
>
> I'm thinking of creating a function that creates an empty dataframe to
> which I can add data, but will need to first determine and ensure that
> each dataframe has the exact same columns, in the exact same
> "location".
>
>
>
> Before I write any new code, is there any pre-existing functions or
> code that might solve this problem of "merging small or medium sized
> dataframes with a "very large" dataframe.)

Consider plyr. Memory issues can be a problem, but it's a piece of
cake to write a one liner that iterates over a list of data frames and
returns them all rbind'd together.  Or just: do.call(rbind,
list.of.data.frames).

If memory is a serious problem then I think it's best to write your
own code that appends each row by index - which avoids copying entire
data frames in memory.

>
> On Tue, Feb 26, 2013 at 2:00 PM, David L Carlson <dcarlson at tamu.edu> wrote:
>> Clumsy but it doesn't require any packages:
>>
>> merge2 <- function(x, y) {
>> if(all(union(names(x), names(y)) == intersect(names(x), names(y)))){
>>    rbind(x, y)
>>    } else merge(x, y, all=TRUE)
>> }
>> merge2(df1, df2)
>> df3 <- df1
>> merge2(df1, df3)
>>
>> ----------------------------------------------
>> David L Carlson
>> Associate Professor of Anthropology
>> Texas A&M University
>> College Station, TX 77843-4352
>>
>>
>>
>>
>>> -----Original Message-----
>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>>> project.org] On Behalf Of arun
>>> Sent: Tuesday, February 26, 2013 1:14 PM
>>> To: Anika Masters
>>> Cc: R help
>>> Subject: Re: [R] merging or joining 2 dataframes: merge, rbind.fill,
>>> etc.?
>>>
>>> Hi,
>>>
>>> You could also try:
>>> library(gtools)
>>> smartbind(df2,df1)
>>> #  a  b  d
>>> #1 7 99 12
>>> #2 7 99 12
>>>
>>>
>>> When df1!=df2
>>> smartbind(df1,df2)
>>> #   a  b  d  x  y  c
>>> #1  7 99 12 NA NA NA
>>> #2 NA 34 88 12 44 56
>>> A.K.
>>>
>>>
>>>
>>>
>>> ----- Original Message -----
>>> From: Anika Masters <anika.masters at gmail.com>
>>> To: r-help at r-project.org
>>> Cc:
>>> Sent: Tuesday, February 26, 2013 1:55 PM
>>> Subject: [R] merging or joining 2 dataframes: merge, rbind.fill, etc.?
>>>
>>> #I want to "merge" or "join" 2 dataframes (df1 & df2) into a 3rd
>>> (mydf).  I want the 3rd dataframe to contain 1 row for each row in df1
>>> & df2, and all the columns in both df1 & df2. The solution should
>>> "work" even if the 2 dataframes are identical, and even if the 2
>>> dataframes do not have the same column names.  The rbind.fill function
>>> seems to work.  For learning purposes, are there other "good" ways to
>>> solve this problem, using merge or other functions other than
>>> rbind.fill?
>>>
>>> #e.g. These 3 examples all seem to "work" correctly and as I hoped:
>>>
>>> df1 <- data.frame(matrix(data=c(7, 99, 12) ,  nrow=1 ,  dimnames =
>>> list( NULL ,  c('a' , 'b' , 'd') ) ) )
>>> df2 <- data.frame(matrix(data=c(88, 34, 12, 44, 56) ,  nrow=1 ,
>>> dimnames = list( NULL ,  c('d' , 'b' , 'x' ,  'y', 'c') ) ) )
>>> mydf <- merge(df2, df1, all.y=T, all.x=T)
>>> mydf
>>>
>>> #e.g. this works:
>>> library(reshape)
>>> mydf <- rbind.fill(df1, df2)
>>> mydf
>>>
>>> #This works:
>>> library(reshape)
>>> mydf <- rbind.fill(df1, df2)
>>> mydf
>>>
>>> #But this does not (the 2 dataframes are identical)
>>> df1 <- data.frame(matrix(data=c(7, 99, 12) ,  nrow=1 ,  dimnames =
>>> list( NULL ,  c('a' , 'b' , 'd') ) ) )
>>> df2 <- df1
>>> mydf <- merge(df2, df1, all.y=T, all.x=T)
>>> mydf
>>>
>>> #Any way to get "mere" to work for this final example? Any other good
>>> solutions?
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>>> guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>>> guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list