[R] Increasing computiation time per column using lapply

Henning Redestig redestig at mpimp-golm.mpg.de
Mon Oct 18 15:51:19 CEST 2004


Hi,

Would be very glad for help on this problem. Using this code:

temp<-function(x, bins, tot) {
   return(as.numeric(lapply(split(x, bins), wtest, tot)));
}

wtest <- function(x, y) {
   return(wilcox.test(x,y)$p.value);
}

rs <- function(x, bins) {
   binCount <- length(split(x[,1], bins));
   tot <- as.numeric(x);
   result<-matrix(apply(x, 2, temp, bins, tot),
          nrow=binCount, byrow=F);
   rownames(result)<-names(split(x[,1], bins));
   colnames(result)<-colnames(x);
   return(result);
}


where x is a matrix and bins is the grouping vector which can be used to 
split every column in x I get

 >rs(x, bins)

to take ~100 s to execute if x has 22000 rows, 2 columns and bins split 
these in to 226 arrays of similar length. Thats all right but, if I 
instead increase to 3 columns it takes ~300 s and with 50 columns it 
takes > 13 h to execute. I can not understand why execution time doesnt 
increase linearly with the amount of columns. Memory status is all fine 
and I never need to start swapping.

I tried to remove the temp function and use a for-loop to iterate over 
the columns instead of using apply but it does not solve my problem.



Thanx!

/Henning, redestig at mpimp-golm.mpg.de




More information about the R-help mailing list