[R] Possible Improvement of the R code

Berend Hasselman bhh at xs4all.nl
Mon Sep 17 06:27:57 CEST 2012


On 17-09-2012, at 00:51, li li wrote:

> Dear all,
>   In the following code, I was trying to compute each row of the "param"
> iteratively based
> on the first row.
>   This likely is not the best way. Can anyone suggest a simpler way to
> improve the code.
>   Thanks a lot!
>         Hannah
> 
> 
> param <- matrix(0, 11, 5)
> 
> colnames(param) <- c("p", "q", "r", "2s", "t")
> 
> param[1,] <- c(0.5, 0.5, 0.4, 0.5, 0.1)
> 
> for (i in 2:11){
> 
> param[i,1] <- param[(i-1),3]+param[(i-1),4]/2
> 
> param[i,2] <- param[(i-1),4]/2+param[(i-1),5]
> 
> param[i,3] <- param[(i-1),1]*(param[(i-1),3]+param[(i-1),4]/2)
> 
> param[i,4] <- param[(i-1),1]*(param[(i-1),4]/2+param[(i-1),5])+param[(i-1),2
> ]*(param[(i-1),3]+param[(i-1),4]/2)
> 
> param[i,5] <- param[(i-1),2]*(param[(i-1),4]/2+param[(i-1),5])
> 
> }
> 

You can use the compiler package.
It also helps if you don't repeat certain calculations. For example (param[(i-1),3]+param[(i-1),4]/2) is computed three times.
Once is enough.

See this example where your code has been put in function f1. The simplified code is in function f3.
Functions f2 and f4 are the compiled versions of f1 and f3.

library(compiler)
library(rbenchmark)
param <- matrix(0, 11, 5)
colnames(param) <- c("p", "q", "r", "2s", "t")
param[1,] <- c(0.5, 0.5, 0.4, 0.5, 0.1)

# your calculation
f1 <- function(param) {
    for (i in 2:11){
        param[i,1] <- param[(i-1),3]+param[(i-1),4]/2
        param[i,2] <- param[(i-1),4]/2+param[(i-1),5]
        param[i,3] <- param[(i-1),1]*(param[(i-1),3]+param[(i-1),4]/2)
        param[i,4] <- param[(i-1),1]*(param[(i-1),4]/2+param[(i-1),5])+param[(i-1),2]*(param[(i-1),3]+param[(i-1),4]/2)
        param[i,5] <- param[(i-1),2]*(param[(i-1),4]/2+param[(i-1),5])
    }

    param
}

f2 <- cmpfun(f1)

# modified by replacing identical sub-expressions with result
f3 <- function(param) {
    for (i in 2:11){
        param[i,1] <- param[(i-1),3]+param[(i-1),4]/2
        param[i,2] <- param[(i-1),4]/2+param[(i-1),5]
        param[i,3] <- param[(i-1),1]*param[i,1]
        param[i,4] <- param[(i-1),1]*param[i,2]+param[(i-1),2]*param[i,1]
        param[i,5] <- param[(i-1),2]*param[i,2]
    }

    param
}

f4 <- cmpfun(f3)

z1 <- f1(param)
z2 <- f2(param)
z3 <- f3(param)
z4 <- f4(param)

Running in R

> all.equal(z2,z1)
[1] TRUE
> all.equal(z3,z1)
[1] TRUE
> all.equal(z4,z1)
[1] TRUE
>
> benchmark(f1(param), f2(param), f3(param), f4(param),replications=5000, columns=c("test", "replications", "elapsed", "relative"))
       test replications elapsed relative
1 f1(param)         5000   3.748    2.502
2 f2(param)         5000   2.104    1.405
3 f3(param)         5000   2.745    1.832
4 f4(param)         5000   1.498    1.000

f4 is quite an improvement over f1.
It's quite possible that more can be gained but I'm too lazy to investigate further.

Berend




More information about the R-help mailing list