[R] Comparing matrices in R - matrixB %in% matrixA
Jeff Newmiller
jdnewmil at dcn.davis.ca.us
Fri Oct 31 16:27:50 CET 2014
Since both of you seem to have misinterpreted my response, consider the
following for clarification:
> A <- matrix(1:1000, 1000, 10)
> B <- A[1:100, ]
> # my recommended solution
> t1 <- system.time({match(as.data.frame(t(B)), as.data.frame(t(A)))})
> # similar to John's recommended solution
> t2 <- system.time({
+ AA <- as.list(as.data.frame(t(A)))
+ BB <- as.list(as.data.frame(t(B)))
+ which( AA %in% BB )
+ })
> t3 <- system.time({
+ lresult <- rep( NA, nrow(A) )
+ for ( ia in seq.int( nrow( A ) ) ) {
+ lres <- FALSE
+ ib <- 0
+ while ( ib < nrow( B ) & !lres ) {
+ ib <- ib + 1
+ lres <- all( A[ ia, ] == B[ ib, ] )
+ }
+ lresult[ ia ] <- lres
+ }
+ which( lresult )
+ })
> t4 <- system.time({
+ res<-c()
+ rowsB = length(B[,1])
+ rowsA = length(A[,1])
+ colsB = length(B[1,])
+ colsA = length(A[1,])
+ for (i in 1:rowsB){
+ for (j in 1:colsB){
+ for (k in 1:rowsA){
+ for (l in 1:colsA){
+ if(A[k,l]==B[i,j]){res<-c(res,k)}
+ }
+ }
+ }
+ }
+ unique(sort(res))
+ })
> t1
user system elapsed
0.022 0.000 0.020
> t2
user system elapsed
0.02 0.00 0.02
> t3
user system elapsed
0.748 0.000 0.746
> t4
user system elapsed
16.612 0.016 16.636
> # data.frames are lists, but applying as.list seems to speed up the
> # match for some reason
> t2[1]/t1[1]
user.self
0.9090909
> # intended comparison for learning purposes
> t4[1]/t3[1]
user.self
22.20856
I recognize that the reference implementation does not need to be
optimized, but the changes I suggested to it illustrate an incremental
improvement toward "thinking in R" rather than the optimal solution.
On Fri, 31 Oct 2014, John Fox wrote:
> Dear Jeff,
>
> For curiosity, I compared your solution with the one I posted earlier this morning (when I was working on a slower computer, accounting for the somewhat different timings for my solution):
>
> ------------ snip ----------
>
>> A <- matrix(1:10000, 10000, 10)
>> B <- A[1:1000, ]
>>
>> system.time({
> + AA <- as.list(as.data.frame(t(A)))
> + BB <- as.list(as.data.frame(t(B)))
> + print(sum(AA %in% BB))
> + })
> [1] 1000
> user system elapsed
> 0.14 0.01 0.16
>>
>>
>> system.time({
> + lresult <- rep( NA, nrow(A) )
> + for ( ia in seq.int( nrow( A ) ) ) {
> + lres <- FALSE
> + ib <- 0
> + while ( ib < nrow( B ) & !lres ) {
> + ib <- ib + 1
> + lres <- all( A[ ia, ] == B[ ib, ] )
> + }
> + lresult[ ia ] <- lres
> + }
> + print(sum( lresult ))
> + })
> [1] 1000
> user system elapsed
> 45.76 0.01 45.77
>> 46/0.16
> [1] 287.5
>
> ------------ snip ----------
>
> So the solution using nested loops is more than 2 orders of magnitude slower for this problem. Of course, for a one-off problem, depending on its size, the difference may not matter.
>
> Best,
> John
>
> -----------------------------------------------
> John Fox, Professor
> McMaster University
> Hamilton, Ontario, Canada
> http://socserv.socsci.mcmaster.ca/jfox/
>
>
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>> project.org] On Behalf Of Jeff Newmiller
>> Sent: Friday, October 31, 2014 10:15 AM
>> To: Charles Novaes de Santana; r-help at r-project.org
>> Subject: Re: [R] Comparing matrices in R - matrixB %in% matrixA
>>
>> Thank you for the reproducible example, but posting in HTML can corrupt
>> your example code so please learn to set your email client mail format
>> appropriately when posting to this list.
>>
>> I think this [1] post, found with a quick Google search for "R match
>> matrix", fits your situation perfectly.
>>
>> match(data.frame(t(B)), data.frame(t(A)))
>>
>> Note that concatenating vectors in loops is bad news... a basic
>> optimization for your code would be to preallocate a logical result
>> vector and fill in each element with a TRUE/FALSE in the outer loop,
>> and use the which() function on that completed vector to identify the
>> index numbers (if you really need that). For example:
>>
>> lresult <- rep( NA, nrow(A) )
>> for ( ia in seq.int( nrow( A ) ) ) {
>> lres <- FALSE
>> ib <- 0
>> while ( ib < nrow( B ) & !lres ) {
>> ib <- ib + 1
>> lres <- all( A[ ia, ] == B[ ib, ] )
>> }
>> lresult[ ia ] <- lres
>> }
>> result <- which( lresult )
>>
>> [1] http://stackoverflow.com/questions/12697122/in-r-match-function-
>> for-rows-or-columns-of-matrix
>> -----------------------------------------------------------------------
>> ----
>> Jeff Newmiller The ..... ..... Go
>> Live...
>> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live
>> Go...
>> Live: OO#.. Dead: OO#..
>> Playing
>> Research Engineer (Solar/Batteries O.O#. #.O#. with
>> /Software/Embedded Controllers) .OO#. .OO#.
>> rocks...1k
>> -----------------------------------------------------------------------
>> ----
>> Sent from my phone. Please excuse my brevity.
>>
>> On October 31, 2014 6:20:38 AM PDT, Charles Novaes de Santana
>> <charles.santana at gmail.com> wrote:
>>> My apologies, because I sent the message before finishing it. i am
>> very
>>> sorry about this. Please find below my message (I use to write the
>>> messages
>>> from the end to the beginning... sorry :)).
>>>
>>> Dear all,
>>>
>>> I am trying to compare two matrices, in order to find in which rows of
>>> a
>>> matrix A I can find the same values as in matrix B. I am trying to do
>>> it
>>> for matrices with around 2500 elements, but please find below a toy
>>> example:
>>>
>>> A = matrix(1:10,nrow=5)
>>> B = A[-c(1,2,3),];
>>>
>>> So
>>>> A
>>> [,1] [,2]
>>> [1,] 1 6
>>> [2,] 2 7
>>> [3,] 3 8
>>> [4,] 4 9
>>> [5,] 5 10
>>>
>>> and
>>>> B
>>> [,1] [,2]
>>> [1,] 4 9
>>> [2,] 5 10
>>>
>>> I would like to compare A and B in order to find in which rows of A I
>>> can
>>> find the rows of B. Something similar to %in% with one dimensional
>>> arrays.
>>> In the example above, the answer should be 4 and 5.
>>>
>>> I did a function to do it (see it below), it gives me the correct
>>> answer
>>> for this toy example, but the excess of for-loops makes it extremely
>>> slow
>>> for larger matrices. I was wondering if there is a better way to do
>>> this
>>> kind of comparison. Any idea? Sorry if it is a stupid question.
>>>
>>> matbinmata<-function(B,A){
>>> res<-c();
>>> rowsB = length(B[,1]);
>>> rowsA = length(A[,1]);
>>> colsB = length(B[1,]);
>>> colsA = length(A[1,]);
>>> for (i in 1:rowsB){
>>> for (j in 1:colsB){
>>> for (k in 1:rowsA){
>>> for (l in 1:colsA){
>>> if(A[k,l]==B[i,j]){res<-c(res,k);}
>>> }
>>> }
>>> }
>>> }
>>> return(unique(sort(res)));
>>> }
>>>
>>>
>>> Best,
>>>
>>> Charles
>>>
>>> On Fri, Oct 31, 2014 at 2:12 PM, Charles Novaes de Santana <
>>> charles.santana at gmail.com> wrote:
>>>
>>>> A = matrix(1:10,nrow=5)
>>>> B = A[-c(1,2,3),];
>>>>
>>>> So
>>>>> A
>>>> [,1] [,2]
>>>> [1,] 1 6
>>>> [2,] 2 7
>>>> [3,] 3 8
>>>> [4,] 4 9
>>>> [5,] 5 10
>>>>
>>>> and
>>>>> B
>>>> [,1] [,2]
>>>> [1,] 4 9
>>>> [2,] 5 10
>>>>
>>>> I would like to compare A and B in order to find in which rows of A
>> I
>>> can
>>>> find the rows of B. Something similar to %in% with one dimensional
>>> arrays.
>>>> In the example above, the answer should be 4 and 5.
>>>>
>>>> I did a function to do it (see it below), it gives me the correct
>>> answer
>>>> for this toy example, but the excess of for-loops makes it extremely
>>> slow
>>>> for larger matrices. I was wondering if there is a better way to do
>>> this
>>>> kind of comparison. Any idea? Sorry if it is a stupid question.
>>>>
>>>> matbinmata<-function(B,A){
>>>> res<-c();
>>>> rowsB = length(B[,1]);
>>>> rowsA = length(A[,1]);
>>>> colsB = length(B[1,]);
>>>> colsA = length(A[1,]);
>>>> for (i in 1:rowsB){
>>>> for (j in 1:colsB){
>>>> for (k in 1:rowsA){
>>>> for (l in 1:colsA){
>>>> if(A[k,l]==B[i,j]){res<-c(res,k);}
>>>> }
>>>> }
>>>> }
>>>> }
>>>> return(unique(sort(res)));
>>>> }
>>>>
>>>>
>>>> Best,
>>>>
>>>> Charles
>>>>
>>>>
>>>> --
>>>> Um ax?! :)
>>>>
>>>> --
>>>> Charles Novaes de Santana, PhD
>>>> http://www.imedea.uib-csic.es/~charles
>>>>
>>>
>>>
>>>
>>> --
>>> Um ax?! :)
>>>
>>> --
>>> Charles Novaes de Santana, PhD
>>> http://www.imedea.uib-csic.es/~charles
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
More information about the R-help
mailing list