[Rd] setdiff for data frames
Prof Brian Ripley
ripley at stats.ox.ac.uk
Tue Dec 11 08:36:49 CET 2007
On Mon, 10 Dec 2007, Charles C. Berry wrote:
> On Mon, 10 Dec 2007, G. Jay Kerns wrote:
>
>> Hello,
>>
>> I have been interested in setdiff() for data frames that operates
>> row-wise. I looked in the documentation, mailing lists, etc., and
>> didn't find exactly the right thing. Given data frames A, B with the
>> same columns, the goal is to extract the rows that are in A, but not
>> in B. Of course, one can usually do setdiff(rownames(A), rownames(B))
>> but that is cheating. :-)
>>
>> I played around a little bit and came up with
>>
>> setdiff.data.frame = function(A, B){
>> g <- function( y, B){
>> any( apply(B, 1, FUN = function(x)
>> identical(all.equal(x, y), TRUE) ) ) }
>> unique( A[ !apply(A, 1, FUN = function(t) g(t, B) ), ] )
>> }
>>
>> I am sure that somebody can do this a better/faster way... any ideas?
>
> setdiff.data.frame <-
> function(A,B) A[ !duplicated( rbind(B,A) )[ -seq_len(nrow(B))] , ]
>
> This ignores rownames(A) which may not be what is wanted in every case.
I was about to suggest using the approach taken by duplicated.data.frame,
(which is to 'hash' the rows to a character vector) then call setdiff.
E.g.
a <- do.call("paste", c(A, sep = "\r"))
b <- do.call("paste", c(B, sep = "\r"))
A[match(setdiff(a, b),a), ]
Note that apply() is intended for matrices (not data frames) and the
version given can do a horrendous amount of coercion, whereas the above
does it only once.
>
> HTH,
>
> Chuck
>
>> Any chance we could get a data.frame method for set.diff in future R
>> versions? (The notion of "set" is somewhat ambiguous with respect to
>> rows, columns, and entries in the data frame case.)
No chance: if you have not found it in the archives, it is too rare a
request.
>> Jay
>>
>> P.S. You can see what I'm looking for with
>>
>> A <- expand.grid( 1:3, 1:3 )
>> B <- A[ 2:5, ]
>> setdiff.data.frame(A,B)
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list