[R] Merging data frames on a variety of columns
Chris Poliquin
chrispoliquin at gmail.com
Fri Sep 17 22:12:46 CEST 2010
Hello,
This is a semi-complicated question about comparing two datasets,
probably using merge, but I am open to other ideas. I have a large
frame of information about companies. It's over 30,000 rows and looks
something like...
df1 <-
identifier1 identifier2 name other_name year
H34 C56 ACME ACME_LTD 2001
H34 NA ACME ACME_LTD 2002
X20 C40 FOO_CO FOO_CO 2004
NA NA BAR_SA BAR_SAB 2004
NA NA BAR_SA BAR_SAB 2005
As you can see, many observations are missing values.
I have a second data frame with information about these same
companies, in fewer rows, and often with slightly different info...
df2 <-
identifier1 identifier2 name year
H34 NA ACME_LTD 2001
H34 NA ACME_LTD 2002
X20 C40 FOO 2004
The idea is to figure out which companies in the first set are not in
the second set. My approach so far is to do various merges and then
remove the matches from the original data frame...
m1 <- merge(df1, df2, by = c("identifier1", "identifier2", "year"),
incomparables=NA)
m2 <- merge(df1, df2, by = c("name", "year"), incomparables=NA)
m3 <- merge(df1, df2, by.x = c("other_name", "year"), by.y = c("name",
"year"), incomparables = NA)
Is this really the best way to accomplish my goal?
Also, for some reason when I do merges like m3, my resulting data
frame is missing columns and I am getting rows that do not appear to
match on the variables I have specified, e.g. ...
year other_name identifier1 name
identifier2
2001 AMDOCS_LTD G0260210 AMDOCS_LTDED C000042913
Help is much appreciated,
Chris
More information about the R-help
mailing list