[R] On the speed of apply and alternatives?

Tue May 9 13:51:00 CEST 2006

[Monty B. ]

>I have to handle a large matrix (1000 x 10001) where in the
>last column i have a value that all the preceding values in the same row
>has to be compared to.  I have made the following code :

># generate a (1000 x 10001) matrix, testm
># generate statistics matrix 1000 x 4:

>qnt <- c(0.01, 0.05)
>cmp_fun  <- function(x)
>{
>  LAST <- length(x)
>  smpls <- x[1:(LAST-1)]
>  real  <- x[LAST]

>  ret <- vector(length=length(qnt)*2)
>  for (i in 1:length(qnt))
>  {
>    q_i  <- quantile(smpls, qnt[i])            # the quantile i
>    m_i <- mean(smpls[smpls<q_i ] )     # mean of obs less than q_i
>    ret[i] <- ifelse(real < q_i, 1, 0)
>    ret[length(qnt)+i] <- ifelse(real < q_i, real - m_i, 0)
>  }
>  ret
>}
>hcvx  <- apply(testm, 1, cmp_fun)

>Can anyone advise as to how I can optimize the runtime of this problem?  
>All suggestions are welcome!

You may speed it up a bit, not so much, with the following:

stats.testm <- function (testm, qnt=c(0.01, 0.05)) {
    quants <- apply(testm[, 1:(ncol(testm)-1)], 1, quantile, qnt)
    smpls <- testm[rep(1:nrow(testm), each=length(qnt)), 1:(ncol(testm)-1)]
    reals <- testm[rep(1:nrow(testm), each=length(qnt)), ncol(testm)]
    keeps <- smpls < rep(quants, ncol(smpls))
    means <- rowSums(smpls * keeps) / rowSums(keeps)
    matrix(rbind((reals < quants) + 0,
                 (reals < quants) * (reals - means)),
           length(qnt) * 2)
}

Try it with something like:

gen.testm <- function (n, m) {
    matrix(sample(0:99, n * (m + 1), TRUE), n)
}

testm <- gen.testm(100, 100)
stats.testm(testm)

Without checking, I would suspect that quantile is the big consumer.
If you could make it without quantile interpolation, maybe some more
vectorisation could be possible, but in any case, I do not think you can 
avoid sorting each row separately, in one way or another (currently done 
within quantile).

-- 
François Pinard   http://pinard.progiciels-bpi.ca