[R] Compare two dataframes
Petr Savicky
savicky at cs.cas.cz
Sat Dec 18 10:00:18 CET 2010
Hi Mark:
> However, if the dataframe contains non-unique rows (two rows with
> exactly the same values in each column) then the unique function will
> delete one of them and that may not be desirable.
In order to get information about equal rows between two dataframes
without removing duplicated rows in each of them, it is possible to
use sorting. For example
n <- ncol(cars)
cars1 <- cbind(cars[1:35, ], df="df1")
cars2 <- cbind(cars[16:50, ], df="df2")
cars.all <- rbind(cars1, cars2) # all cases together, column "df" indicates origin of each case
row.names(cars.all) <- seq(nrow(cars.all))
cars.sorted <- cars.all[do.call(order, cars.all), ]
# compute an index, which is the same for rows, which are equal except of the "df" component.
index <- cumsum(1 - duplicated(cars.sorted[, 1:n]))
# for each index of a unique row, compute the number of occurrences in both dataframes
out <- table(index, cars.sorted$df)
out[15:20, ]
index df1 df2
15 1 0
16 1 1
17 2 2
18 1 1
19 1 1
20 1 1
This shows, for example, that the row with index 17 has 2 occurrences in both
dataframes. These rows can be obtained using
cars.sorted[index == 17, ]
speed dist df
17 13 34 df1
18 13 34 df1
37 13 34 df2
38 13 34 df2
See also ?rle.
Petr.
More information about the R-help
mailing list