[R] counting the occurrences of vectors
Marc Schwartz
MSchwartz at MedAnalytics.com
Sat Jul 3 17:50:14 CEST 2004
On Sat, 2004-07-03 at 09:31, Ravi Varadhan wrote:
> Hi:
>
> I have two matrices, A and B, where A is n x k, and B is m x k, where
> n >> m >> k. Is there a computationally fast way to count the number
> of times each row (a k-vector) of B occurs in A? Thanks for any
> suggestions.
>
> Best,
> Ravi.
How about something like this:
row.match <- function(m1, m2)
{
if (ncol(m1) != (ncol(m2)))
stop("Matrices must have the same number of columns")
m1.l <- apply(m1, 1, list)
m2.l <- apply(m2 ,1, list)
# return boolean for m1.l in m2.l
m1.l %in% m2.l
}
Example of use:
m <- matrix(1:20, ncol = 4, byrow = TRUE)
n <- matrix(1:40, ncol = 4, byrow = TRUE)
> m
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
[4,] 13 14 15 16
[5,] 17 18 19 20
> n
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
[4,] 13 14 15 16
[5,] 17 18 19 20
[6,] 21 22 23 24
[7,] 25 26 27 28
[8,] 29 30 31 32
[9,] 33 34 35 36
[10,] 37 38 39 40
> row.match(n, m)
[1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
If you want to know which rows from n are matches:
> n[row.match(n, m), ]
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
[4,] 13 14 15 16
[5,] 17 18 19 20
and if you just want the indices from n:
> which(row.match(n, m))
[1] 1 2 3 4 5
For timing, if I create some large matrices:
> m <- matrix(1:20000, ncol = 4, byrow = TRUE)
> nrow(m)
[1] 5000
> n <- matrix(1:40000, ncol = 4, byrow = TRUE)
> nrow(n)
[1] 10000
> system.time(row.match(n, m))
[1] 0.39 0.01 0.41 0.00 0.00
> length(row.match(n, m))
[1] 10000
Does that get you what you want?
HTH,
Marc Schwartz
More information about the R-help
mailing list