[R] applying data generating function

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Mar 8 08:29:48 CET 2004


You need to run gc() before running such timings in R, as the first run 
often has to pay for a level-0 garbage collection.  That is normally the 
cause of (1), although I haven't seen differences as large as 10 secs (but 
have no idea of the speed of your machine, and have seen 3 secs).

On Sun, 7 Mar 2004, Spencer Graves wrote:

>       Peter's enumeration of alternatives inspired me to compare compute 
> times for N = 10^(2:5), with the following results:      
> 
> *** R 1.8.1 under Windows 2000, IBM Thinkpad T30: 
>                           10  100 1000 10000  1e+05
> for loop                   0 0.01 0.09  1.27 192.05
> gen e + for loop           0 0.00 0.03  0.22   2.58
> create storage + for loop  0 0.01 0.05  0.34   3.45
> sapply                     0 0.00 0.04  0.28   3.82
> replicate                  0 0.01 0.05  0.29   4.02
> 
>       I repeated this with the "for loop" both first and last.  The 
> times tended to decline on replication, with the "for loop" time for N = 
> 1e5 = 182.02, 126.04 (with the "for loop" last), 130.30 ("for loop" 
> last), and 118.64 ("for loop" first again). 
> 
>       Conclusions: 
>      
>       (1) Apparently, in some cases, R picks up speed upon replication
> 
>       (2) The first 3 times for the "for loop" with N = 1e5 made me 
> wonder if there was an order effect, with the "for loop" being longer in 
> the first position.  However, the last run with the "for loop" again 
> first had the shortest time of 118.64, contradicting that hypothesis. 
> 
>       By comparison, I also tried this under S-Plus 6.2: 
> 
> *** S-Plus 6.2, Windows 2000, IBM Thinkpad T30 ("for loop" first): 
>                            10  100  1000 10000  100000
>                  for loop 0.01 0.05 0.331 3.976 273.073
>          gen e + for loop 0.00 0.04 0.320 3.154  29.112
> create storage + for loop 0.01 0.03 0.231 2.113  22.242
>                    sapply 0.00 0.04 0.380 4.757  23.003
> 
>       The script I used appears below.  As Peter said, "the only really 
> crucial [issue] is to avoid the inefficient append by preallocating" the 
> vectors to be generated.  Moreover, this is only an issue for long loop, 
> with a threshold of between 1e4 and 1e5 in this example.  For shorter 
> loops, the programmers' time is far more valuable. 
> 
> Enjoy.  spencer graves
> ####################
> 
> 
> N.gen <- c(10, 100, 1000, 10000, 1e5)
> mtds <- c("for loop", "gen e + for loop", "create storage + for loop",
>     "sapply", "replicate")
> m <- length(N.gen)   
> ellapsed.time <- array(NA, dim=c(m, length(mtds)))
> dimnames(ellapsed.time) <- list(N.gen, mtds)
>    
> for(iN in 1:m){
>     cat("\n", iN, "")
>     N <- N.gen[iN]
> #for loop
> set.seed(123)
> start.time <- proc.time()
> f<-function (x.) { 3.8*x.*(1-x.) + rnorm(1,0,.001) }
> v=c()
> x=.1 # starting point
> for (i in 1:N) { x=f(x); v=append(v,x) }
> ellapsed.time[iN, "for loop"] <- (proc.time()-start.time)[3]   
> cat(mtds[1], "")
> 
> #gen e + for loop
> set.seed(123)
> start.time <- proc.time()
> e <- 0.001*rnorm(N)
> X <- rep(0.1, N+1)
> for(i in 2:(N+1))
>     X[i] <- (3.8*X[i-1]*(1-X[i-1])+e[i-1])
> ellapsed.time[iN, "gen e + for loop"] <- (proc.time()-start.time)[3]
> cat(mtds[2], "")
> 
> #create storage + for loop 
> set.seed(123)
> start.time <- proc.time()
> V <- numeric(N)
> xv <- .1 ; for (i in 1:N) { xv <- f(xv); V[i] <- xv }
> ellapsed.time[iN, "create storage + for loop"] <- 
> (proc.time()-start.time)[3]
> cat(mtds[3], "")
> 
> #sapply
> set.seed(123)
> start.time <- proc.time()
> xa <- .1 ; va <- sapply(1:N, function(i) xa <<- f(xa))
> ellapsed.time[iN, "sapply"] <- (proc.time()-start.time)[3]   
> cat(mtds[4], "")
> 
> if(!is.null(version$language)){
> #replicate
> set.seed(123)
> start.time <- proc.time()
> z <- .1 ; vr <- replicate(N, z <<- f(z))
> ellapsed.time[iN, "replicate"] <- (proc.time()-start.time)[3]
> cat(mtds[5], "")
> }
> 
> }
> 
> t(ellapsed.time)
> #############################
> Peter Dalgaard wrote:
> 
> >Christophe Pallier <pallier at lscp.ehess.fr> writes:
> >
> >  
> >
> >>Fred J. wrote:
> >>
> >>    
> >>
> >>>I need to generate a data set based on this equation
> >>>X(t) = 3.8x(t-1) (1-x(t-1)) + e(t), where e(t) is a
> >>>N(0,0,001) random variable
> >>>I need say 100 values.
> >>>
> >>>How do I do this?
> >>>      
> >>>
> >>I assume X(t) and x(t) are the same (?).
> >>
> >>f<-function (x) { 3.8*x*(1-x) + rnorm(1,0,.001) }
> >>v=c()
> >>x=.1 # starting point
> >>for (i in 1:100) { x=f(x); v=append(v,x) }
> >>
> >>There may be smarter ways...
> >>    
> >>
> >
> >Yes, but the only really crucial one is to avoid the inefficient append  by
> >preallocating the v: 
> >
> >v <- numeric(100)
> >x <- .1 ; for (i in 1:100) { x <- f(x); v[i] <- x }
> >
> >apart from that you can use implicit loops:
> >
> >x <- .1 ; v <- sapply(1:100, function(i) x <<- f(x))
> >
> >or
> >
> >z <- .1 ; v <- replicate(100, z <<- f(z))
> >
> >(You cannot use x there because of a variable capture issue which is a
> >bit of a bug. I intend to fix it for 1.9.0.)
> >
> >  
> >
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 
> 

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list