[Rd] (PR#11064) how to reproduce...

simon.debernard at altrabio.com simon.debernard at altrabio.com
Fri Jun 13 18:45:25 CEST 2008


You can try this:

data <- cbind("a"=sample(1:100000), "b"=sample(1:100000))
fact <- sample(rep(1:10000, each=10))
system.time(std <- by(data, fact, colSums))
by.matrix <- function (data, INDICES, FUN, ...) {
     if (!is.list(INDICES)) {
         IND <- vector("list", 1)
         IND[[1]] <- INDICES
         names(IND) <- deparse(substitute(INDICES))[1]
     }
     else IND <- INDICES
     FUNx <- function(x) FUN(data[x, , drop = FALSE], ...)
     nd <- nrow(data)
     ans <- eval(substitute(tapply(1:nd, IND, FUNx)), as.data.frame 
(data))
     attr(ans, "call") <- match.call()
     class(ans) <- "by"
     ans
}
system.time(mod <- by(data, fact, colSums))
all.equal(std, mod)

I get a 30x speed up
(I'm not sure why the attributes differ, but I'm sure this can be  
fixed...)



More information about the R-devel mailing list