[R] Intersection of 2 matrices

Hans W Borchers hwborchers at googlemail.com
Fri Dec 2 20:22:31 CET 2011


Michael Kao <mkao006rmail <at> gmail.com> writes:

> 
Your solution is fast, but not completely correct, because you are also 
counting possible duplicates within the second matrix. The 'refitted'
function could look as follows:

    compMat2 <- function(A, B) {  # rows of B present in A
        B0 <- B[!duplicated(B), ]
        na <- nrow(A); nb <- nrow(B0)
        AB <- rbind(A, B0)
        ab <- duplicated(AB)[(na+1):(na+nb)]
        return(sum(ab))
    }

and testing an example the size the OR was asking for:

    set.seed(8237)
    A  <- matrix(sample(1:1000, 2*67420, replace=TRUE), 67420, 2)
    B  <- matrix(sample(1:1000, 2*59199, replace=TRUE), 59199, 2)

    system.time(n <- compMat2(A, B))  # n = 3790

while compMat() will return 5522 rows, with 1732 duplicates within B !
A 3.06 GHz iMac needs about 2 -- 2.5 seconds.

Hans Werner


> On 2/12/2011 2:48 p.m., David Winsemius wrote:
> >
> > On Dec 2, 2011, at 4:20 AM, oluwole oyebamiji wrote:
> >
> >> Hi all,
> >>     I have matrix A of 67420 by 2 and another matrix B of 59199 by 2. 
> >> I would like to find the number of rows of matrix B that I can find 
> >> in matrix A (rows that are common to both matrices with or without 
> >> sorting).
> >>
> >> I have tried the "intersection" and "is.element" functions in R but 
> >> it only working for the vectors and not matrix
> >> i.e,    intersection(A,B) and is.element(A,B).
> >
> > Have you considered the 'duplicated' function?
> >
> 
> Here is an example based on the duplicated function
> 
> test.mat1 <- matrix(1:20, nc = 5)
> 
> test.mat2 <- rbind(test.mat1[sample(1:5, 2), ], matrix(101:120, nc = 5))
> 
> compMat <- function(mat1, mat2){
>      nr1 <- nrow(mat1)
>      nr2 <- nrow(mat2)
>      mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]
> }
> 
> compMat(test.mat1, test.mat2)
> 
>



More information about the R-help mailing list