[R] applying data generating function
Prof Brian Ripley
ripley at stats.ox.ac.uk
Mon Mar 8 08:29:48 CET 2004
You need to run gc() before running such timings in R, as the first run
often has to pay for a level-0 garbage collection. That is normally the
cause of (1), although I haven't seen differences as large as 10 secs (but
have no idea of the speed of your machine, and have seen 3 secs).
On Sun, 7 Mar 2004, Spencer Graves wrote:
> Peter's enumeration of alternatives inspired me to compare compute
> times for N = 10^(2:5), with the following results:
>
> *** R 1.8.1 under Windows 2000, IBM Thinkpad T30:
> 10 100 1000 10000 1e+05
> for loop 0 0.01 0.09 1.27 192.05
> gen e + for loop 0 0.00 0.03 0.22 2.58
> create storage + for loop 0 0.01 0.05 0.34 3.45
> sapply 0 0.00 0.04 0.28 3.82
> replicate 0 0.01 0.05 0.29 4.02
>
> I repeated this with the "for loop" both first and last. The
> times tended to decline on replication, with the "for loop" time for N =
> 1e5 = 182.02, 126.04 (with the "for loop" last), 130.30 ("for loop"
> last), and 118.64 ("for loop" first again).
>
> Conclusions:
>
> (1) Apparently, in some cases, R picks up speed upon replication
>
> (2) The first 3 times for the "for loop" with N = 1e5 made me
> wonder if there was an order effect, with the "for loop" being longer in
> the first position. However, the last run with the "for loop" again
> first had the shortest time of 118.64, contradicting that hypothesis.
>
> By comparison, I also tried this under S-Plus 6.2:
>
> *** S-Plus 6.2, Windows 2000, IBM Thinkpad T30 ("for loop" first):
> 10 100 1000 10000 100000
> for loop 0.01 0.05 0.331 3.976 273.073
> gen e + for loop 0.00 0.04 0.320 3.154 29.112
> create storage + for loop 0.01 0.03 0.231 2.113 22.242
> sapply 0.00 0.04 0.380 4.757 23.003
>
> The script I used appears below. As Peter said, "the only really
> crucial [issue] is to avoid the inefficient append by preallocating" the
> vectors to be generated. Moreover, this is only an issue for long loop,
> with a threshold of between 1e4 and 1e5 in this example. For shorter
> loops, the programmers' time is far more valuable.
>
> Enjoy. spencer graves
> ####################
>
>
> N.gen <- c(10, 100, 1000, 10000, 1e5)
> mtds <- c("for loop", "gen e + for loop", "create storage + for loop",
> "sapply", "replicate")
> m <- length(N.gen)
> ellapsed.time <- array(NA, dim=c(m, length(mtds)))
> dimnames(ellapsed.time) <- list(N.gen, mtds)
>
> for(iN in 1:m){
> cat("\n", iN, "")
> N <- N.gen[iN]
> #for loop
> set.seed(123)
> start.time <- proc.time()
> f<-function (x.) { 3.8*x.*(1-x.) + rnorm(1,0,.001) }
> v=c()
> x=.1 # starting point
> for (i in 1:N) { x=f(x); v=append(v,x) }
> ellapsed.time[iN, "for loop"] <- (proc.time()-start.time)[3]
> cat(mtds[1], "")
>
> #gen e + for loop
> set.seed(123)
> start.time <- proc.time()
> e <- 0.001*rnorm(N)
> X <- rep(0.1, N+1)
> for(i in 2:(N+1))
> X[i] <- (3.8*X[i-1]*(1-X[i-1])+e[i-1])
> ellapsed.time[iN, "gen e + for loop"] <- (proc.time()-start.time)[3]
> cat(mtds[2], "")
>
> #create storage + for loop
> set.seed(123)
> start.time <- proc.time()
> V <- numeric(N)
> xv <- .1 ; for (i in 1:N) { xv <- f(xv); V[i] <- xv }
> ellapsed.time[iN, "create storage + for loop"] <-
> (proc.time()-start.time)[3]
> cat(mtds[3], "")
>
> #sapply
> set.seed(123)
> start.time <- proc.time()
> xa <- .1 ; va <- sapply(1:N, function(i) xa <<- f(xa))
> ellapsed.time[iN, "sapply"] <- (proc.time()-start.time)[3]
> cat(mtds[4], "")
>
> if(!is.null(version$language)){
> #replicate
> set.seed(123)
> start.time <- proc.time()
> z <- .1 ; vr <- replicate(N, z <<- f(z))
> ellapsed.time[iN, "replicate"] <- (proc.time()-start.time)[3]
> cat(mtds[5], "")
> }
>
> }
>
> t(ellapsed.time)
> #############################
> Peter Dalgaard wrote:
>
> >Christophe Pallier <pallier at lscp.ehess.fr> writes:
> >
> >
> >
> >>Fred J. wrote:
> >>
> >>
> >>
> >>>I need to generate a data set based on this equation
> >>>X(t) = 3.8x(t-1) (1-x(t-1)) + e(t), where e(t) is a
> >>>N(0,0,001) random variable
> >>>I need say 100 values.
> >>>
> >>>How do I do this?
> >>>
> >>>
> >>I assume X(t) and x(t) are the same (?).
> >>
> >>f<-function (x) { 3.8*x*(1-x) + rnorm(1,0,.001) }
> >>v=c()
> >>x=.1 # starting point
> >>for (i in 1:100) { x=f(x); v=append(v,x) }
> >>
> >>There may be smarter ways...
> >>
> >>
> >
> >Yes, but the only really crucial one is to avoid the inefficient append by
> >preallocating the v:
> >
> >v <- numeric(100)
> >x <- .1 ; for (i in 1:100) { x <- f(x); v[i] <- x }
> >
> >apart from that you can use implicit loops:
> >
> >x <- .1 ; v <- sapply(1:100, function(i) x <<- f(x))
> >
> >or
> >
> >z <- .1 ; v <- replicate(100, z <<- f(z))
> >
> >(You cannot use x there because of a variable capture issue which is a
> >bit of a bug. I intend to fix it for 1.9.0.)
> >
> >
> >
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list