[R] fast way to compare two matrices of combinations
Erik Iverson
iverson at biostat.wisc.edu
Thu Mar 13 17:27:58 CET 2008
Hello Mark -
It may help if you provide a (small) set of example input and what you'd
like as your output.
Best,
Erik Iverson
Mark W Kimpel wrote:
> I have a list (length 750), each element containing a vector of unique
> strings (unique gene ids), with length up to ~40 (median 15). I want to
> compile a matrix of all possible triplets and their frequency within
> gene elements. Using combn and a lot of looping, I am accomplishing this
> but it is VERY slow.
>
> I've tried to figure out a way to vectorize this, using "match" and
> "%in%", but can't get my mind around it.
>
> Below is my code. sig.tf.pairs is the list. Suggestions?
>
> Mark
>
>
> ############################################################
> M <- 3 # 3 for triplets, etc.
> ##########################################################
> # count all triplets
> all.triplets <- NULL
> all.count.vec <- NULL
> for (i in 1:length(sig.tf.pairs)){
> if (length(sig.tf.pairs[[i]] >= M)){
> triplets <- combn(sig.tf.pairs[[i]], M, simplify = TRUE)
> for (j in 1:ncol(triplets)){
> o <- order(triplets[,j])
> triplets[,j] <- triplets[o,j]
> count.vec <- rep(1, ncol(triplets))
> }
> if (is.null(all.count.vec)){
> all.count.vec <- count.vec
> all.triplets <- triplets
> } else {
> redundant.vec <- NULL
> for (k in 1:ncol(all.triplets)){
> for (m in 1:ncol(triplets)){
> if (length(intersect(triplets[,m], all.triplets[,k] == M))){
> all.count.vec[k] <- all.count.vec[k] + 1
> redundant.vec <- c(redundant.vec, m)
> }
> }
> }
> if(!is.null(redundant.vec)){
> triplets <- triplets[,-redundant.vec]
> count.vec <- count.vec[,-redundant.vec]
> }
> all.triplets <- cbind(all.triplets, triplets)
> all.count.vec <- c(all.count.vec, count.vec)
> }
> }
> }
> ###################################
>
More information about the R-help
mailing list