[R] comparing 2 dataframes

Christoph Buser buser at stat.math.ethz.ch
Tue Nov 7 08:46:29 CET 2006


Hi

Maybe this example can help you to find your solution:

dat1 <- data.frame(CUSTOMER_ID = c("1000786BR", "1002047BR", "10127BR",
                     "1004166834BR"," 1004310897BR", "1006180BR",
                     "10064798BR", "1007311BR", "1007621BR",
                     "1008195BR", "10126BR", "95323994BR"),
                   CUSTOMER_RR = c("5+", "4", "5+", "2", "X", "4", "4", "5+",
                     "4", "4-", "5+", "4"))

dat2 <- data.frame(CUSTOMER_ID = c("1200786BR", "1802047BR", "1027BR",
                     "10166834BR", "107BR", "100BR", "164798BR", "1008195BR",
                     "10126BR"),
                   CUSTOMER_RR = c("6+", "4", "1+", "2", "X", "4", "4", "4",
                     "5+"))

## Merge, but only by "CUSTOMER_ID"
datM <- merge(dat1, dat2, by = "CUSTOMER_ID")
datM
## Select only cases that have a similar "CUSTOMER_RR"
datM1 <- datM[as.character(datM[, "CUSTOMER_RR.x"]) %in%
              as.character(datM[,"CUSTOMER_RR.y"]), ]
datM1

Regards,

Christoph

--------------------------------------------------------------

Credit and Surety PML study: visit our web page www.cs-pml.org

--------------------------------------------------------------
Christoph Buser <buser at stat.math.ethz.ch>
Seminar fuer Statistik, LEO C13
ETH Zurich	8092 Zurich	 SWITZERLAND
phone: x-41-44-632-4673		fax: 632-1228
http://stat.ethz.ch/~buser/
--------------------------------------------------------------



Priya Kanhai writes:
 > Hi,
 > 
 > I''ve a question about comparing 2 dataframes: RRC_db1 and RRC_db2 of
 > different length.
 > 
 > For example:
 > 
 > RRC_db1:
 > 
 >     CUSTOMER_ID CUSTOMER_RR
 > 1     1000786BR                   5+
 > 2     1002047BR                    4
 > 3       10127BR                   5+
 > 4  1004166834BR                    2
 > 5  1004310897BR                    X
 > 6     1006180BR                    4
 > 7    10064798BR                    4
 > 8     1007311BR                   5+
 > 9     1007621BR                    4
 > 10    1008195BR                   4-
 > 11      10126BR                   5+
 > 12   95323994BR                    4
 > 
 >  RRC_db2:
 > 
 >     CUSTOMER_ID CUSTOMER_RR
 > 1     1200786BR                   6+
 > 2     1802047BR                    4
 > 3      1027BR                     1+
 > 4   10166834BR                    2
 > 5   107BR                          X
 > 6     100BR                        4
 > 7    164798BR                    4
 > 8    1008195BR                   4-
 > 9      10126BR                   5+
 > 
 > 
 > I want to pick the CUSTOMER_ID of RRC_db1 which also exist in RRC_db2:
 > third <- merge(RRC_db1, RRC_db2) or  third <-subset(RRC_db1, CUSTOMER_ID%in%
 > RRC_db2$CUSTOMER_ID)
 > 
 > But I also want to check if the CUSTOMER_RR is correct. I had tried this:
 > 
 > > test <- function(RRC_db1,RRC_db2)
 > + {
 > + noteq <- c()
 > + for( i in 1:length(RRC_db1$CUSTOMER_ID)){
 > + for( j in 1:length(RRC_db2$CUSTOMER_ID)){
 > + if(RRC_db1$CUSTOMER_ID[i] == RRC_db2$CUSTOMER_ID[j]){
 > + if(RRC_db1$CUSTOMER_RR[i] != RRC_db2$CUSTOMER_RR[j]){
 > + noteq <- c(noteq,RRC_db1$CUSTOMER_ID[i]);
 > + }
 > + }
 > + }
 > + }
 > + noteq;
 > + }
 > >
 > > test(RRC_db1, RRC_db2)
 > Error in Ops.factor(RRC_db1$CUSTOMER_ID[i], RRC_db2$CUSTOMER_ID[j]) :
 >         level sets of factors are different
 > 
 > 
 > But then I got this error.
 > 
 > I don't only want the CUSTOMER_ID to be the same but also the CUSTOMER_RR.
 > 
 > Can you please help me?
 > 
 > Thanks in advance.
 > 
 > Regards,
 > 
 > Priya
 > 
 > 	[[alternative HTML version deleted]]
 > 
 > ______________________________________________
 > R-help at stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 > and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list