[Rd] setdiff for data frames
G. Jay Kerns
gkerns at ysu.edu
Mon Dec 10 16:53:44 CET 2007
Hello,
I have been interested in setdiff() for data frames that operates
row-wise. I looked in the documentation, mailing lists, etc., and
didn't find exactly the right thing. Given data frames A, B with the
same columns, the goal is to extract the rows that are in A, but not
in B. Of course, one can usually do setdiff(rownames(A), rownames(B))
but that is cheating. :-)
I played around a little bit and came up with
setdiff.data.frame = function(A, B){
g <- function( y, B){
any( apply(B, 1, FUN = function(x)
identical(all.equal(x, y), TRUE) ) ) }
unique( A[ !apply(A, 1, FUN = function(t) g(t, B) ), ] )
}
I am sure that somebody can do this a better/faster way... any ideas?
Any chance we could get a data.frame method for set.diff in future R
versions? (The notion of "set" is somewhat ambiguous with respect to
rows, columns, and entries in the data frame case.)
Jay
P.S. You can see what I'm looking for with
A <- expand.grid( 1:3, 1:3 )
B <- A[ 2:5, ]
setdiff.data.frame(A,B)
