[R] applying data generating function
Liaw, Andy
andy_liaw at merck.com
Tue Mar 9 00:50:29 CET 2004
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Spencer Graves
> Sent: Monday, March 08, 2004 11:21 AM
> To: Prof Brian Ripley
> Cc: Fred J.; r-help; Peter Dalgaard
> Subject: Re: [R] applying data generating function
>
>
> With "gc()" right before each call to "proc.time", as
> Brian Ripley
> and Gabor Grothendieck suggested, the times were substantially more
> stable. For the for loop, extending the vector with each of 1e5
> iterations, I got 181.25, 181.27, 182.72, 182.44, and 182.56. The
> averages of the last 3 of these tests are as follows:
>
> 10 100 1000 10000 1e+05
> for loop 0 0.01 0.05 1.13 182.14
> gen e + for loop 0 0.00 0.03 0.26 2.58
> create storage + for loop 0 0.00 0.04 0.39 3.94
> sapply 0 0.00 0.03 0.32 4.05
> replicate 0 0.00 0.03 0.31 3.55
>
> Without "gc()", I got 192.05, 182.02, 126.04, 130.30, and 118.64 for
> extending the vector with each for loop iteration.
>
> Three more observations about this:
>
> 1. Without "gc()", the times started higher but declined by
> roughly a third. This suggests that R may actually be storing
> intermediate "semi-compiled" code in "garbage" and using it when the
> situation warrants -- but "gc()" discards it.
>
> 2. Increasing N from 1e4 to 1e5 increased the time NOT by a
> factor of 10 but by a factor of 161 = 182/1.13 when the length of the
> vector was extended in each iteration.
Growing a vector inside for loop is expensive and inefficient because R
needs to allocate a new vector of sufficient length, copy the data over,
then de-reference the old vector. (This my understanding of how it works.
Hopefully I'm not too far off...)
HTH,
Andy
> 3. The fastest was indeed to generate "e" and allocate all
> required storage before entering the loop, but the major
> improvement was
> due to allocation of storage before initiating the for loop.
> However,
> with 1000 or fewer iterations, the difference was hardly detectable.
>
> Best Wishes,
> spencer graves
>
> Prof Brian Ripley wrote:
>
> >You need to run gc() before running such timings in R, as
> the first run
> >often has to pay for a level-0 garbage collection. That is
> normally the
> >cause of (1), although I haven't seen differences as large
> as 10 secs (but
> >have no idea of the speed of your machine, and have seen 3 secs).
> >
> >On Sun, 7 Mar 2004, Spencer Graves wrote:
> >
> >
> >
> >> Peter's enumeration of alternatives inspired me to
> compare compute
> >>times for N = 10^(2:5), with the following results:
> >>
> >>*** R 1.8.1 under Windows 2000, IBM Thinkpad T30:
> >> 10 100 1000 10000 1e+05
> >>for loop 0 0.01 0.09 1.27 192.05
> >>gen e + for loop 0 0.00 0.03 0.22 2.58
> >>create storage + for loop 0 0.01 0.05 0.34 3.45
> >>sapply 0 0.00 0.04 0.28 3.82
> >>replicate 0 0.01 0.05 0.29 4.02
> >>
> >> I repeated this with the "for loop" both first and last. The
> >>times tended to decline on replication, with the "for loop"
> time for N =
> >>1e5 = 182.02, 126.04 (with the "for loop" last), 130.30 ("for loop"
> >>last), and 118.64 ("for loop" first again).
> >>
> >> Conclusions:
> >>
> >> (1) Apparently, in some cases, R picks up speed upon
> replication
> >>
> >> (2) The first 3 times for the "for loop" with N = 1e5 made me
> >>wonder if there was an order effect, with the "for loop"
> being longer in
> >>the first position. However, the last run with the "for
> loop" again
> >>first had the shortest time of 118.64, contradicting that
> hypothesis.
> >>
> >> By comparison, I also tried this under S-Plus 6.2:
> >>
> >>*** S-Plus 6.2, Windows 2000, IBM Thinkpad T30 ("for loop" first):
> >> 10 100 1000 10000 100000
> >> for loop 0.01 0.05 0.331 3.976 273.073
> >> gen e + for loop 0.00 0.04 0.320 3.154 29.112
> >>create storage + for loop 0.01 0.03 0.231 2.113 22.242
> >> sapply 0.00 0.04 0.380 4.757 23.003
> >>
> >> The script I used appears below. As Peter said, "the
> only really
> >>crucial [issue] is to avoid the inefficient append by
> preallocating" the
> >>vectors to be generated. Moreover, this is only an issue
> for long loop,
> >>with a threshold of between 1e4 and 1e5 in this example.
> For shorter
> >>loops, the programmers' time is far more valuable.
> >>
> >>Enjoy. spencer graves
> >>####################
> >>
> >>
> >>N.gen <- c(10, 100, 1000, 10000, 1e5)
> >>mtds <- c("for loop", "gen e + for loop", "create storage +
> for loop",
> >> "sapply", "replicate")
> >>m <- length(N.gen)
> >>ellapsed.time <- array(NA, dim=c(m, length(mtds)))
> >>dimnames(ellapsed.time) <- list(N.gen, mtds)
> >>
> >>for(iN in 1:m){
> >> cat("\n", iN, "")
> >> N <- N.gen[iN]
> >>#for loop
> >>set.seed(123)
> >>start.time <- proc.time()
> >>f<-function (x.) { 3.8*x.*(1-x.) + rnorm(1,0,.001) }
> >>v=c()
> >>x=.1 # starting point
> >>for (i in 1:N) { x=f(x); v=append(v,x) }
> >>ellapsed.time[iN, "for loop"] <- (proc.time()-start.time)[3]
> >>cat(mtds[1], "")
> >>
> >>#gen e + for loop
> >>set.seed(123)
> >>start.time <- proc.time()
> >>e <- 0.001*rnorm(N)
> >>X <- rep(0.1, N+1)
> >>for(i in 2:(N+1))
> >> X[i] <- (3.8*X[i-1]*(1-X[i-1])+e[i-1])
> >>ellapsed.time[iN, "gen e + for loop"] <- (proc.time()-start.time)[3]
> >>cat(mtds[2], "")
> >>
> >>#create storage + for loop
> >>set.seed(123)
> >>start.time <- proc.time()
> >>V <- numeric(N)
> >>xv <- .1 ; for (i in 1:N) { xv <- f(xv); V[i] <- xv }
> >>ellapsed.time[iN, "create storage + for loop"] <-
> >>(proc.time()-start.time)[3]
> >>cat(mtds[3], "")
> >>
> >>#sapply
> >>set.seed(123)
> >>start.time <- proc.time()
> >>xa <- .1 ; va <- sapply(1:N, function(i) xa <<- f(xa))
> >>ellapsed.time[iN, "sapply"] <- (proc.time()-start.time)[3]
> >>cat(mtds[4], "")
> >>
> >>if(!is.null(version$language)){
> >>#replicate
> >>set.seed(123)
> >>start.time <- proc.time()
> >>z <- .1 ; vr <- replicate(N, z <<- f(z))
> >>ellapsed.time[iN, "replicate"] <- (proc.time()-start.time)[3]
> >>cat(mtds[5], "")
> >>}
> >>
> >>}
> >>
> >>t(ellapsed.time)
> >>#############################
> >>Peter Dalgaard wrote:
> >>
> >>
> >>
> >>>Christophe Pallier <pallier at lscp.ehess.fr> writes:
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>>Fred J. wrote:
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>I need to generate a data set based on this equation
> >>>>>X(t) = 3.8x(t-1) (1-x(t-1)) + e(t), where e(t) is a
> >>>>>N(0,0,001) random variable
> >>>>>I need say 100 values.
> >>>>>
> >>>>>How do I do this?
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>I assume X(t) and x(t) are the same (?).
> >>>>
> >>>>f<-function (x) { 3.8*x*(1-x) + rnorm(1,0,.001) }
> >>>>v=c()
> >>>>x=.1 # starting point
> >>>>for (i in 1:100) { x=f(x); v=append(v,x) }
> >>>>
> >>>>There may be smarter ways...
> >>>>
> >>>>
> >>>>
> >>>>
> >>>Yes, but the only really crucial one is to avoid the
> inefficient append by
> >>>preallocating the v:
> >>>
> >>>v <- numeric(100)
> >>>x <- .1 ; for (i in 1:100) { x <- f(x); v[i] <- x }
> >>>
> >>>apart from that you can use implicit loops:
> >>>
> >>>x <- .1 ; v <- sapply(1:100, function(i) x <<- f(x))
> >>>
> >>>or
> >>>
> >>>z <- .1 ; v <- replicate(100, z <<- f(z))
> >>>
> >>>(You cannot use x there because of a variable capture
> issue which is a
> >>>bit of a bug. I intend to fix it for 1.9.0.)
> >>>
> >>>
> >>>
> >>>
> >>>
> >>______________________________________________
> >>R-help at stat.math.ethz.ch mailing list
> >>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> >>PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
> >>
> >>
> >>
> >>
> >
>
> >
> >
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
------------------------------------------------------------------------------
Notice: This e-mail message, together with any attachments,...{{dropped}}
More information about the R-help
mailing list