[R] replacing a for-loop with lapply

Huntsinger, Reid reid_huntsinger at merck.com
Mon May 9 19:22:31 CEST 2005


I suggest

1. Transpose "data" once at the beginning. 
2. Replace "apply" with "colSums" to find cols with sum = d. Since you have
logical values, the sum count the number of TRUES and you want them all
TRUE, it looks to me.

With further work you could vectorize this, but loops in R are actually
pretty good once you can streamline the code inside. 

I get

> system.time(for(i in 1:n) Chat[i] <-
sum(apply(t(data)<=data[i,],2,prod))/(n+1))
[1] 0.62 0.01 0.73   NA   NA

while with

> tdata <- t(data)

I get much improved

> system.time(for(i in 1:n) Chat[i] <- sum(colSums(tdata <= tdata[,i]) ==
d)/(n+1))
[1] 0.04 0.00 0.04   NA   NA

Reid Huntsinger

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Daniel Berg
Sent: Monday, May 09, 2005 12:32 PM
To: r-help at stat.math.ethz.ch
Subject: [R] replacing a for-loop with lapply


Dear All,

I am trying to compute a goodness-of-fit statistic for a copula, based on an
empirical density estimate of this copula. 
To do this I can use the following code:


> n <- dim(data)[1]
> d <- dim(data)[2]
> Chat <- rep(0,n)
> for(i in 1:n)
+ Chat[i] <- sum(apply(t(data)<=data[i,],2,prod))/(n+1)


However, I have a feeling this can be done more effectively than using a
for-loop. I have also tried the following:


> tmp1 <- lapply(1:n,function(i) t(data)<=data[i,])
> tmp2 <- lapply(1:n,function(i) apply(tmp1[[i]],2,prod))
> Chat <- as.numeric(lapply(1:n, function(i) sum(tmp2[[i]])))


but there is no improvement. I ran the following timing test:


> data <- matrix(runif(300),100,3)
> n = dim(data)[1]
> d = dim(data)[2]
> Chat = vector("numeric",n)
> M <- 30
> a <- rep(0,M)
> for(m in 1:M){
+ a[m] <- system.time({
+ tmp1 <- lapply(1:n,function(i) t(data)<=data[i,])
+ tmp2 <- lapply(1:n,function(i) apply(tmp1[[i]],2,prod))
+ Chat <- as.numeric(lapply(1:n, function(i) sum(tmp2[[i]])))})[3]}
> b <- rep(0,M)
> for(m in 1:30){
+ b[m] <- system.time(	
+ for (i in 1:n)
+ Chat[i] = sum(apply(t(data)<=data[i,],2,prod))/(n+1))[3]}
> summary(a)
> summary(b)


and the output was:


> summary(a)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.8500  0.8700  0.8900  0.9013  0.9300  0.9800 
> summary(b)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.8400  0.8600  0.8800  0.8883  0.9075  0.9900


Is there any way I can code this more efficiently in R or will I have to
turn to C? The data sets, on which I am actually going to run this code,
will be of sizes up to (5000x100) and I need hundreds of realizations...

Thank you for your time.

Rgds,
Daniel

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html




More information about the R-help mailing list