# [R] Comparing entire row sets at once efficiently

Dirk Eddelbuettel edd at debian.org
Thu Sep 28 17:54:07 CEST 2006

```Dear useRs,

I am having a hard time coming up with a nice and efficient solution to
a problem on entires matrices or data.frames. In spirit, this is similar to
what setdiff() and setequal() do, but I need it in more dimensions.

Here's a brief description.

* given a set of factors or sequences, expand.grid() gives me the set
of permutations in a data.frame;

in my case all arguments are numeric so I could convert the data frame to
a matrix

let's call this one Candidates

* I have a second matrix (or data frame) to compare to; this second
set may be a subset of the first, or a superset but it guaranted to
contain the same columns

let's call this one Comparison

* I want know which rows in Candidates are not yet in Comparison.

A toy example:

> Comparison <- matrix(1:30, ncol=5)
> Candidates <- Comparison[c(2,4), ]
> checkRow <- function(r, M) { any( (r[1] == M[,1]) & (r[2] == M[,2]) & (r[3] == M[,3]) & (r[4] == M[,4]) ) }
> checkRow( Candidates[1,], Comparison)
[1] TRUE
> falseRow <- Candidates[1,]
> falseRow[2] <- 42
> checkRow( falseRow, Comparison)
[1] FALSE
>

The checkRow function works but is a) klunky, b) hardcodes the dimension and
c) works only on one row at a time.

There must be better ways, at least for a) and b).  What am I missing?

Feel free to reply off-list and I'd gladly summarize back to the list. If you
don't want your reply (or email) summarized back, please indicate.

Thanks, Dirk

--
Hell, there are no rules here - we're trying to accomplish something.
-- Thomas A. Edison

```

More information about the R-help mailing list