[R] dataframe selection using a multi-value key
Erik Iverson
eriki at ccbr.umn.edu
Wed Sep 8 03:01:16 CEST 2010
Hello,
On 09/07/2010 07:25 PM, Markus Weisner wrote:
> I am merging two dataframes using a relational key (incident number and
> incident year), but not all the records match up. I want to be able to
> review only the records that cannot be merged for each individual dataframe
> (essentially trying to select records from one dataframe using a multi-value
> relational key from the other dataframe). The following code shows what I
> am trying to do. The final two lines of code do not work, but if somebody
> could figure out a workable solution, that would be great. Thanks.
> --Markus
>
> incidents = data.frame(
> INC_NO = c(1,2,3,4,5,6,7,8,9,10),
> INC_YEAR = c(2006, 2006, 2006, 2007, 2008, 2008, 2008, 2008, 2009,
> 2010),
> INC_TYPE = c("EMS", "FIRE", "GAS", "MVA", "EMS", "EMS", "EMS",
> "FIRE", "EMS", "EMS"))
>
> responses = data.frame(
> INC_NO = c(1,2,2,2,3,4,5,6,7,8,8,8,9,10),
> INC_YEAR = c(2006, 2006, 2006, 2006, 2006, 2007, 2008, 2008, 2008,
> 2018, 2018, 2018, 2009, 2010),
> UNIT_TYPE = c("E2", "E2", "E5", "T1", "E7", "E6", "E2", "E2", "E1",
> "E3", "E7", "T1", "E7", "E5"))
>
> merged_data = merge(incidents, responses, by=c("INC_NO", "INC_YEAR"))
>
> relational_key = c("INC_NO", "INC_YEAR")
>
> ## following does not work, but I want DF of incidents that did not merge up
> with responses
> incidents[incidents[,relational_key] %in% responses[,relational_key],]
>
> ## following does not work, but I want DF of responses that did not merge up
> with incidents
> responses[responses[,relational_key] %in% incidents[,relational_key],]
Surely there's a more elegant way... This function takes two
data.frames, and returns those elements that aren't merged
as a list, one element for each data.frame. You need to
specify a key.
compare <- function(df1, df2, key) {
md <- merge(df1, df2, by = key)
keys <- lapply(list(df1, df2, md),
function(x) do.call("paste", c(x[key], sep = "\r")))
mapply(function(x, y) x[!y %in% keys[[3]], ], list(df1, df2),
keys[-3], SIMPLIFY = FALSE)
}
> compare(responses, incidents, key = relational_key)
[[1]]
INC_NO INC_YEAR UNIT_TYPE
10 8 2018 E3
11 8 2018 E7
12 8 2018 T1
[[2]]
INC_NO INC_YEAR INC_TYPE
8 8 2008 FIRE
More information about the R-help
mailing list