[R] comparing 2 dataframes
Christoph Buser
buser at stat.math.ethz.ch
Tue Nov 7 08:46:29 CET 2006
Hi
Maybe this example can help you to find your solution:
dat1 <- data.frame(CUSTOMER_ID = c("1000786BR", "1002047BR", "10127BR",
"1004166834BR"," 1004310897BR", "1006180BR",
"10064798BR", "1007311BR", "1007621BR",
"1008195BR", "10126BR", "95323994BR"),
CUSTOMER_RR = c("5+", "4", "5+", "2", "X", "4", "4", "5+",
"4", "4-", "5+", "4"))
dat2 <- data.frame(CUSTOMER_ID = c("1200786BR", "1802047BR", "1027BR",
"10166834BR", "107BR", "100BR", "164798BR", "1008195BR",
"10126BR"),
CUSTOMER_RR = c("6+", "4", "1+", "2", "X", "4", "4", "4",
"5+"))
## Merge, but only by "CUSTOMER_ID"
datM <- merge(dat1, dat2, by = "CUSTOMER_ID")
datM
## Select only cases that have a similar "CUSTOMER_RR"
datM1 <- datM[as.character(datM[, "CUSTOMER_RR.x"]) %in%
as.character(datM[,"CUSTOMER_RR.y"]), ]
datM1
Regards,
Christoph
--------------------------------------------------------------
Credit and Surety PML study: visit our web page www.cs-pml.org
--------------------------------------------------------------
Christoph Buser <buser at stat.math.ethz.ch>
Seminar fuer Statistik, LEO C13
ETH Zurich 8092 Zurich SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--------------------------------------------------------------
Priya Kanhai writes:
> Hi,
>
> I''ve a question about comparing 2 dataframes: RRC_db1 and RRC_db2 of
> different length.
>
> For example:
>
> RRC_db1:
>
> CUSTOMER_ID CUSTOMER_RR
> 1 1000786BR 5+
> 2 1002047BR 4
> 3 10127BR 5+
> 4 1004166834BR 2
> 5 1004310897BR X
> 6 1006180BR 4
> 7 10064798BR 4
> 8 1007311BR 5+
> 9 1007621BR 4
> 10 1008195BR 4-
> 11 10126BR 5+
> 12 95323994BR 4
>
> RRC_db2:
>
> CUSTOMER_ID CUSTOMER_RR
> 1 1200786BR 6+
> 2 1802047BR 4
> 3 1027BR 1+
> 4 10166834BR 2
> 5 107BR X
> 6 100BR 4
> 7 164798BR 4
> 8 1008195BR 4-
> 9 10126BR 5+
>
>
> I want to pick the CUSTOMER_ID of RRC_db1 which also exist in RRC_db2:
> third <- merge(RRC_db1, RRC_db2) or third <-subset(RRC_db1, CUSTOMER_ID%in%
> RRC_db2$CUSTOMER_ID)
>
> But I also want to check if the CUSTOMER_RR is correct. I had tried this:
>
> > test <- function(RRC_db1,RRC_db2)
> + {
> + noteq <- c()
> + for( i in 1:length(RRC_db1$CUSTOMER_ID)){
> + for( j in 1:length(RRC_db2$CUSTOMER_ID)){
> + if(RRC_db1$CUSTOMER_ID[i] == RRC_db2$CUSTOMER_ID[j]){
> + if(RRC_db1$CUSTOMER_RR[i] != RRC_db2$CUSTOMER_RR[j]){
> + noteq <- c(noteq,RRC_db1$CUSTOMER_ID[i]);
> + }
> + }
> + }
> + }
> + noteq;
> + }
> >
> > test(RRC_db1, RRC_db2)
> Error in Ops.factor(RRC_db1$CUSTOMER_ID[i], RRC_db2$CUSTOMER_ID[j]) :
> level sets of factors are different
>
>
> But then I got this error.
>
> I don't only want the CUSTOMER_ID to be the same but also the CUSTOMER_RR.
>
> Can you please help me?
>
> Thanks in advance.
>
> Regards,
>
> Priya
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list