[R] formatC slow? (or how can I make this function faster?

hadley wickham h.wickham at gmail.com
Mon Jan 23 04:42:24 CET 2006


I'm trying to convert a matrix of capture occasions to format that an
external program can read.  The job is to basically take a row of
matrix, like

> smp[1,]
 [1] 1 1 0 1 1 1 0 0 0 0

and convert it to the equivalent string "1101110000"

I'm having problems doing this in a speedy way.  The simplest solution
(calc_history below, using apply, paste and collapse) takes about 2
seconds for a 10,000 x 10 matrix.   I thought perhaps paste might be
building up the string in an efficient manner, so I tried using matrix
multiplication and formatC (as in calc_history2).  This is about 25%
faster, but still seems slow.

smp <- matrix(rbinom(100000, 1, 0.5), nrow=10000)

calc_history <- function(smp) {
	apply(smp, 1, paste, collapse="")
}

calc_history <- function(smp) {
	mul <- 10 ^ ((ncol(smp)-1):0)
	as.vector(formatC(smp %*% mul, format="d", width=ncol(smp), flag=0))
}

system.time(calc_history(smp))
system.time(calc_history2(smp))

Any ideas for improvement?

Thanks,

Hadley




More information about the R-help mailing list