[R] replacing a for-loop with lapply
Daniel Berg
daniel at nr.no
Mon May 9 18:31:50 CEST 2005
Dear All,
I am trying to compute a goodness-of-fit statistic for a copula, based on an
empirical density estimate of this copula.
To do this I can use the following code:
> n <- dim(data)[1]
> d <- dim(data)[2]
> Chat <- rep(0,n)
> for(i in 1:n)
+ Chat[i] <- sum(apply(t(data)<=data[i,],2,prod))/(n+1)
However, I have a feeling this can be done more effectively than using a
for-loop. I have also tried the following:
> tmp1 <- lapply(1:n,function(i) t(data)<=data[i,])
> tmp2 <- lapply(1:n,function(i) apply(tmp1[[i]],2,prod))
> Chat <- as.numeric(lapply(1:n, function(i) sum(tmp2[[i]])))
but there is no improvement. I ran the following timing test:
> data <- matrix(runif(300),100,3)
> n = dim(data)[1]
> d = dim(data)[2]
> Chat = vector("numeric",n)
> M <- 30
> a <- rep(0,M)
> for(m in 1:M){
+ a[m] <- system.time({
+ tmp1 <- lapply(1:n,function(i) t(data)<=data[i,])
+ tmp2 <- lapply(1:n,function(i) apply(tmp1[[i]],2,prod))
+ Chat <- as.numeric(lapply(1:n, function(i) sum(tmp2[[i]])))})[3]}
> b <- rep(0,M)
> for(m in 1:30){
+ b[m] <- system.time(
+ for (i in 1:n)
+ Chat[i] = sum(apply(t(data)<=data[i,],2,prod))/(n+1))[3]}
> summary(a)
> summary(b)
and the output was:
> summary(a)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.8500 0.8700 0.8900 0.9013 0.9300 0.9800
> summary(b)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.8400 0.8600 0.8800 0.8883 0.9075 0.9900
Is there any way I can code this more efficiently in R or will I have to
turn to C? The data sets, on which I am actually going to run this code,
will be of sizes up to (5000x100) and I need hundreds of realizations...
Thank you for your time.
Rgds,
Daniel
More information about the R-help
mailing list