[R] applying data generating function

Mon Mar 8 05:02:17 CET 2004

Regarding your comment on speed varying when replicating the
runs, try running gc() first.

---
Date:   Sun, 07 Mar 2004 17:56:46 -0800 
From:   Spencer Graves <spencer.graves at pdf.com>
To:   Peter Dalgaard <p.dalgaard at biostat.ku.dk> 
Cc:   Fred J. <phddas at yahoo.com>,r-help <r-help at stat.math.ethz.ch> 
Subject:   Re: [R] applying data generating function 

Peter's enumeration of alternatives inspired me to compare compute 
times for N = 10^(2:5), with the following results: 

*** R 1.8.1 under Windows 2000, IBM Thinkpad T30: 
10 100 1000 10000 1e+05
for loop 0 0.01 0.09 1.27 192.05
gen e + for loop 0 0.00 0.03 0.22 2.58
create storage + for loop 0 0.01 0.05 0.34 3.45
sapply 0 0.00 0.04 0.28 3.82
replicate 0 0.01 0.05 0.29 4.02

I repeated this with the "for loop" both first and last. The 
times tended to decline on replication, with the "for loop" time for N = 
1e5 = 182.02, 126.04 (with the "for loop" last), 130.30 ("for loop" 
last), and 118.64 ("for loop" first again). 

Conclusions: 

(1) Apparently, in some cases, R picks up speed upon replication

(2) The first 3 times for the "for loop" with N = 1e5 made me 
wonder if there was an order effect, with the "for loop" being longer in 
the first position. However, the last run with the "for loop" again 
first had the shortest time of 118.64, contradicting that hypothesis. 

By comparison, I also tried this under S-Plus 6.2: 

*** S-Plus 6.2, Windows 2000, IBM Thinkpad T30 ("for loop" first): 
10 100 1000 10000 100000
for loop 0.01 0.05 0.331 3.976 273.073
gen e + for loop 0.00 0.04 0.320 3.154 29.112
create storage + for loop 0.01 0.03 0.231 2.113 22.242
sapply 0.00 0.04 0.380 4.757 23.003

The script I used appears below. As Peter said, "the only really 
crucial [issue] is to avoid the inefficient append by preallocating" the 
vectors to be generated. Moreover, this is only an issue for long loop, 
with a threshold of between 1e4 and 1e5 in this example. For shorter 
loops, the programmers' time is far more valuable. 

Enjoy. spencer graves
####################

N.gen <- c(10, 100, 1000, 10000, 1e5)
mtds <- c("for loop", "gen e + for loop", "create storage + for loop",
"sapply", "replicate")
m <- length(N.gen) 
ellapsed.time <- array(NA, dim=c(m, length(mtds)))
dimnames(ellapsed.time) <- list(N.gen, mtds)

for(iN in 1:m){
cat("\n", iN, "")
N <- N.gen[iN]
#for loop
set.seed(123)
start.time <- proc.time()
f<-function (x.) { 3.8*x.*(1-x.) + rnorm(1,0,.001) }
v=c()
x=.1 # starting point
for (i in 1:N) { x=f(x); v=append(v,x) }
ellapsed.time[iN, "for loop"] <- (proc.time()-start.time)[3] 
cat(mtds[1], "")

#gen e + for loop
set.seed(123)
start.time <- proc.time()
e <- 0.001*rnorm(N)
X <- rep(0.1, N+1)
for(i in 2:(N+1))
X[i] <- (3.8*X[i-1]*(1-X[i-1])+e[i-1])
ellapsed.time[iN, "gen e + for loop"] <- (proc.time()-start.time)[3]
cat(mtds[2], "")

#create storage + for loop 
set.seed(123)
start.time <- proc.time()
V <- numeric(N)
xv <- .1 ; for (i in 1:N) { xv <- f(xv); V[i] <- xv }
ellapsed.time[iN, "create storage + for loop"] <- 
(proc.time()-start.time)[3]
cat(mtds[3], "")

#sapply
set.seed(123)
start.time <- proc.time()
xa <- .1 ; va <- sapply(1:N, function(i) xa <<- f(xa))
ellapsed.time[iN, "sapply"] <- (proc.time()-start.time)[3] 
cat(mtds[4], "")

if(!is.null(version$language)){
#replicate
set.seed(123)
start.time <- proc.time()
z <- .1 ; vr <- replicate(N, z <<- f(z))
ellapsed.time[iN, "replicate"] <- (proc.time()-start.time)[3]
cat(mtds[5], "")
}

}

t(ellapsed.time)
#############################
Peter Dalgaard wrote:

>Christophe Pallier <pallier at lscp.ehess.fr> writes:
>
> 
>
>>Fred J. wrote:
>>
>> 
>>
>>>I need to generate a data set based on this equation
>>>X(t) = 3.8x(t-1) (1-x(t-1)) + e(t), where e(t) is a
>>>N(0,0,001) random variable
>>>I need say 100 values.
>>>
>>>How do I do this?
>>> 
>>>
>>I assume X(t) and x(t) are the same (?).
>>
>>f<-function (x) { 3.8*x*(1-x) + rnorm(1,0,.001) }
>>v=c()
>>x=.1 # starting point
>>for (i in 1:100) { x=f(x); v=append(v,x) }
>>
>>There may be smarter ways...
>> 
>>
>
>Yes, but the only really crucial one is to avoid the inefficient append by
>preallocating the v: 
>
>v <- numeric(100)
>x <- .1 ; for (i in 1:100) { x <- f(x); v[i] <- x }
>
>apart from that you can use implicit loops:
>
>x <- .1 ; v <- sapply(1:100, function(i) x <<- f(x))
>
>or
>
>z <- .1 ; v <- replicate(100, z <<- f(z))
>
>(You cannot use x there because of a variable capture issue which is a
>bit of a bug. I intend to fix it for 1.9.0.)
>
> 
>