[R] On the speed of apply and alternatives?
François Pinard
pinard at iro.umontreal.ca
Tue May 9 13:51:00 CEST 2006
[Monty B. ]
>I have to handle a large matrix (1000 x 10001) where in the
>last column i have a value that all the preceding values in the same row
>has to be compared to. I have made the following code :
># generate a (1000 x 10001) matrix, testm
># generate statistics matrix 1000 x 4:
>qnt <- c(0.01, 0.05)
>cmp_fun <- function(x)
>{
> LAST <- length(x)
> smpls <- x[1:(LAST-1)]
> real <- x[LAST]
> ret <- vector(length=length(qnt)*2)
> for (i in 1:length(qnt))
> {
> q_i <- quantile(smpls, qnt[i]) # the quantile i
> m_i <- mean(smpls[smpls<q_i ] ) # mean of obs less than q_i
> ret[i] <- ifelse(real < q_i, 1, 0)
> ret[length(qnt)+i] <- ifelse(real < q_i, real - m_i, 0)
> }
> ret
>}
>hcvx <- apply(testm, 1, cmp_fun)
>Can anyone advise as to how I can optimize the runtime of this problem?
>All suggestions are welcome!
You may speed it up a bit, not so much, with the following:
stats.testm <- function (testm, qnt=c(0.01, 0.05)) {
quants <- apply(testm[, 1:(ncol(testm)-1)], 1, quantile, qnt)
smpls <- testm[rep(1:nrow(testm), each=length(qnt)), 1:(ncol(testm)-1)]
reals <- testm[rep(1:nrow(testm), each=length(qnt)), ncol(testm)]
keeps <- smpls < rep(quants, ncol(smpls))
means <- rowSums(smpls * keeps) / rowSums(keeps)
matrix(rbind((reals < quants) + 0,
(reals < quants) * (reals - means)),
length(qnt) * 2)
}
Try it with something like:
gen.testm <- function (n, m) {
matrix(sample(0:99, n * (m + 1), TRUE), n)
}
testm <- gen.testm(100, 100)
stats.testm(testm)
Without checking, I would suspect that quantile is the big consumer.
If you could make it without quantile interpolation, maybe some more
vectorisation could be possible, but in any case, I do not think you can
avoid sorting each row separately, in one way or another (currently done
within quantile).
--
François Pinard http://pinard.progiciels-bpi.ca
More information about the R-help
mailing list