[R] incrementing matrix elements more efficiently
Romain Francois
romain.francois at dbmail.com
Sun Aug 15 09:34:11 CEST 2010
Le 15/08/10 02:43, david h shanabrook a écrit :
> I need to increment cells of a matrix (collusionM). The indexes to increment are in an index (matchIndex). This is sample code
>
> library(seqinr)
> library(Matrix)
> x<- "abcabcabc"
> mx<- s2c(x)
> collisionM<- Matrix(0,nrow=10, ncol=10, sparse=TRUE)
> matchIndex<- list()
> rows<- length(mx)
> for (j in 1:(rows)) matchIndex[[j]]<- which(mx[j]==mx)
> for (j in 1:(rows)) collisionM[j,matchIndex[[j]]]<- collisionM[j,matchIndex[[j]]] + 1
>
> Works fine, except with my data (rows=32000) it is too slow. The problem is the second for loop, where it increments the index of the sparse matrix; this needs to be rewritten so it is more efficient. Any ideas?
Hi,
Nice exercise to demonstrate matrix indexing. First let's look at it on
a simpler example:
> x <- matrix( 1:16, nr = 4 )
> x
[,1] [,2] [,3] [,4]
[1,] 1 5 9 13
[2,] 2 6 10 14
[3,] 3 7 11 15
[4,] 4 8 12 16
> y <- cbind( c(1,2,3,4), c(1,2,3,4) )
> y
[,1] [,2]
[1,] 1 1
[2,] 2 2
[3,] 3 3
[4,] 4 4
> x[ y ]
[1] 1 6 11 16
Now we can apply it to your example, below f is your code and g uses
matrix indexing:
f <- function(x = "abcabcabc"){
mx <- s2c(x)
rows <- length(mx)
collisionM <- Matrix(0,nrow=rows, ncol=rows, sparse=TRUE)
matchIndex <- list()
for (j in 1:(rows)) matchIndex[[j]] <- which(mx[j]==mx)
for (j in 1:(rows)) collisionM[j,matchIndex[[j]]] <-
collisionM[j,matchIndex[[j]]] + 1
collisionM
}
g <- function( x = "abcabcabc"){
mx <- s2c(x)
rows <- length(mx)
collisionM <- Matrix(0,nrow=rows, ncol=rows, sparse=TRUE)
matchIndex <- do.call( rbind, lapply( 1:rows, function( i){
cbind( i, which( mx[i] == mx ) )
} ) )
collisionM[ matchIndex ] <- collisionM[matchIndex] + 1
collisionM
}
# first check it works as expected :
> all( f() == g() )
[1] TRUE
# then do some timings
> long <- paste( sample(letters, 1000, replace = TRUE ), collapse = "" )
> system.time( f( x = long ) )
user system elapsed
8.022 1.052 9.074
> system.time( g( x = long ) )
user system elapsed
0.079 0.003 0.082
... and it seems we don't even need indexing at all, we can just create
the matrix using sparseMatrix :
h <- function( x = "abcabcabc"){
mx <- s2c(x)
rows <- length(mx)
matchIndex <- do.call( rbind, lapply( 1:rows, function( i){
cbind( i, which( mx[i] == mx ) )
} ) )
sparseMatrix( matchIndex[,1], matchIndex[,2] )
}
which gives some improvement :
> system.time( h( x = long ) )
user system elapsed
0.048 0.000 0.048
Romain
--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://bit.ly/bzoWrs : Rcpp svn revision 2000
|- http://bit.ly/b8VNE2 : Rcpp at LondonR, oct 5th
`- http://bit.ly/aAyra4 : highlight 0.2-2
More information about the R-help
mailing list