[R] Merging data frames on a variety of columns

Chris Poliquin chrispoliquin at gmail.com
Fri Sep 17 22:12:46 CEST 2010


Hello,

This is a semi-complicated question about comparing two datasets,
probably using merge, but I am open to other ideas.  I have a large
frame of information about companies.  It's over 30,000 rows and looks
something like...

df1 <-

identifier1     identifier2    name            other_name        year
   H34               C56       ACME           ACME_LTD       2001
   H34               NA        ACME           ACME_LTD       2002
   X20               C40       FOO_CO        FOO_CO          2004
   NA                NA        BAR_SA        BAR_SAB        2004
   NA                NA        BAR_SA        BAR_SAB        2005

As you can see, many observations are missing values.
I have a second data frame with information about these same
companies, in fewer rows, and often with slightly different info...

df2 <-

identifier1     identifier2    name                   year
   H34               NA        ACME_LTD          2001
   H34               NA        ACME_LTD          2002
   X20               C40       FOO                    2004

The idea is to figure out which companies in the first set are not in
the second set.  My approach so far is to do various merges and then
remove the matches from the original data frame...

m1 <- merge(df1, df2, by = c("identifier1", "identifier2", "year"),
incomparables=NA)
m2 <- merge(df1, df2, by = c("name", "year"), incomparables=NA)
m3 <- merge(df1, df2, by.x = c("other_name", "year"), by.y = c("name",
"year"), incomparables = NA)


Is this really the best way to accomplish my goal?

Also, for some reason when I do merges like m3, my resulting data
frame is missing columns and I am getting rows that do not appear to
match on the variables I have specified, e.g. ...

year         other_name       identifier1                name
    identifier2
2001      AMDOCS_LTD    G0260210     AMDOCS_LTDED     C000042913


Help is much appreciated,
Chris



More information about the R-help mailing list