[R] applying data generating function

Liaw, Andy andy_liaw at merck.com
Tue Mar 9 00:50:29 CET 2004



> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Spencer Graves
> Sent: Monday, March 08, 2004 11:21 AM
> To: Prof Brian Ripley
> Cc: Fred J.; r-help; Peter Dalgaard
> Subject: Re: [R] applying data generating function
> 
> 
>       With "gc()" right before each call to "proc.time", as 
> Brian Ripley 
> and Gabor Grothendieck suggested, the times were substantially more 
> stable.  For the for loop, extending the vector with each of 1e5 
> iterations, I got 181.25, 181.27, 182.72, 182.44, and 182.56.  The 
> averages of the last 3 of these tests are as follows: 
> 
>                           10  100 1000 10000  1e+05
> for loop                   0 0.01 0.05  1.13 182.14
> gen e + for loop           0 0.00 0.03  0.26   2.58
> create storage + for loop  0 0.00 0.04  0.39   3.94
> sapply                     0 0.00 0.03  0.32   4.05
> replicate                  0 0.00 0.03  0.31   3.55
> 
> Without "gc()", I got 192.05, 182.02, 126.04, 130.30, and 118.64 for 
> extending the vector with each for loop iteration. 
> 
>       Three more observations about this: 
> 
>       1.  Without "gc()", the times started higher but declined by 
> roughly a third.  This suggests that R may actually be storing 
> intermediate "semi-compiled" code in "garbage" and using it when the 
> situation warrants -- but "gc()" discards it. 
> 
>       2.  Increasing N from 1e4 to 1e5 increased the time NOT by a 
> factor of 10 but by a factor of 161 = 182/1.13 when the length of the 
> vector was extended in each iteration. 

Growing a vector inside for loop is expensive and inefficient because R
needs to allocate a new vector of sufficient length, copy the data over,
then de-reference the old vector.  (This my understanding of how it works.
Hopefully I'm not too far off...)

HTH,
Andy
 
>       3.  The fastest was indeed to generate "e" and allocate all 
> required storage before entering the loop, but the major 
> improvement was 
> due to allocation of storage before initiating the for loop.  
> However, 
> with 1000 or fewer iterations, the difference was hardly detectable. 
> 
>       Best Wishes,
>       spencer graves
> 
> Prof Brian Ripley wrote:
> 
> >You need to run gc() before running such timings in R, as 
> the first run 
> >often has to pay for a level-0 garbage collection.  That is 
> normally the 
> >cause of (1), although I haven't seen differences as large 
> as 10 secs (but 
> >have no idea of the speed of your machine, and have seen 3 secs).
> >
> >On Sun, 7 Mar 2004, Spencer Graves wrote:
> >
> >  
> >
> >>      Peter's enumeration of alternatives inspired me to 
> compare compute 
> >>times for N = 10^(2:5), with the following results:      
> >>
> >>*** R 1.8.1 under Windows 2000, IBM Thinkpad T30: 
> >>                          10  100 1000 10000  1e+05
> >>for loop                   0 0.01 0.09  1.27 192.05
> >>gen e + for loop           0 0.00 0.03  0.22   2.58
> >>create storage + for loop  0 0.01 0.05  0.34   3.45
> >>sapply                     0 0.00 0.04  0.28   3.82
> >>replicate                  0 0.01 0.05  0.29   4.02
> >>
> >>      I repeated this with the "for loop" both first and last.  The 
> >>times tended to decline on replication, with the "for loop" 
> time for N = 
> >>1e5 = 182.02, 126.04 (with the "for loop" last), 130.30 ("for loop" 
> >>last), and 118.64 ("for loop" first again). 
> >>
> >>      Conclusions: 
> >>     
> >>      (1) Apparently, in some cases, R picks up speed upon 
> replication
> >>
> >>      (2) The first 3 times for the "for loop" with N = 1e5 made me 
> >>wonder if there was an order effect, with the "for loop" 
> being longer in 
> >>the first position.  However, the last run with the "for 
> loop" again 
> >>first had the shortest time of 118.64, contradicting that 
> hypothesis. 
> >>
> >>      By comparison, I also tried this under S-Plus 6.2: 
> >>
> >>*** S-Plus 6.2, Windows 2000, IBM Thinkpad T30 ("for loop" first): 
> >>                           10  100  1000 10000  100000
> >>                 for loop 0.01 0.05 0.331 3.976 273.073
> >>         gen e + for loop 0.00 0.04 0.320 3.154  29.112
> >>create storage + for loop 0.01 0.03 0.231 2.113  22.242
> >>                   sapply 0.00 0.04 0.380 4.757  23.003
> >>
> >>      The script I used appears below.  As Peter said, "the 
> only really 
> >>crucial [issue] is to avoid the inefficient append by 
> preallocating" the 
> >>vectors to be generated.  Moreover, this is only an issue 
> for long loop, 
> >>with a threshold of between 1e4 and 1e5 in this example.  
> For shorter 
> >>loops, the programmers' time is far more valuable. 
> >>
> >>Enjoy.  spencer graves
> >>####################
> >>
> >>
> >>N.gen <- c(10, 100, 1000, 10000, 1e5)
> >>mtds <- c("for loop", "gen e + for loop", "create storage + 
> for loop",
> >>    "sapply", "replicate")
> >>m <- length(N.gen)   
> >>ellapsed.time <- array(NA, dim=c(m, length(mtds)))
> >>dimnames(ellapsed.time) <- list(N.gen, mtds)
> >>   
> >>for(iN in 1:m){
> >>    cat("\n", iN, "")
> >>    N <- N.gen[iN]
> >>#for loop
> >>set.seed(123)
> >>start.time <- proc.time()
> >>f<-function (x.) { 3.8*x.*(1-x.) + rnorm(1,0,.001) }
> >>v=c()
> >>x=.1 # starting point
> >>for (i in 1:N) { x=f(x); v=append(v,x) }
> >>ellapsed.time[iN, "for loop"] <- (proc.time()-start.time)[3]   
> >>cat(mtds[1], "")
> >>
> >>#gen e + for loop
> >>set.seed(123)
> >>start.time <- proc.time()
> >>e <- 0.001*rnorm(N)
> >>X <- rep(0.1, N+1)
> >>for(i in 2:(N+1))
> >>    X[i] <- (3.8*X[i-1]*(1-X[i-1])+e[i-1])
> >>ellapsed.time[iN, "gen e + for loop"] <- (proc.time()-start.time)[3]
> >>cat(mtds[2], "")
> >>
> >>#create storage + for loop 
> >>set.seed(123)
> >>start.time <- proc.time()
> >>V <- numeric(N)
> >>xv <- .1 ; for (i in 1:N) { xv <- f(xv); V[i] <- xv }
> >>ellapsed.time[iN, "create storage + for loop"] <- 
> >>(proc.time()-start.time)[3]
> >>cat(mtds[3], "")
> >>
> >>#sapply
> >>set.seed(123)
> >>start.time <- proc.time()
> >>xa <- .1 ; va <- sapply(1:N, function(i) xa <<- f(xa))
> >>ellapsed.time[iN, "sapply"] <- (proc.time()-start.time)[3]   
> >>cat(mtds[4], "")
> >>
> >>if(!is.null(version$language)){
> >>#replicate
> >>set.seed(123)
> >>start.time <- proc.time()
> >>z <- .1 ; vr <- replicate(N, z <<- f(z))
> >>ellapsed.time[iN, "replicate"] <- (proc.time()-start.time)[3]
> >>cat(mtds[5], "")
> >>}
> >>
> >>}
> >>
> >>t(ellapsed.time)
> >>#############################
> >>Peter Dalgaard wrote:
> >>
> >>    
> >>
> >>>Christophe Pallier <pallier at lscp.ehess.fr> writes:
> >>>
> >>> 
> >>>
> >>>      
> >>>
> >>>>Fred J. wrote:
> >>>>
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>>>I need to generate a data set based on this equation
> >>>>>X(t) = 3.8x(t-1) (1-x(t-1)) + e(t), where e(t) is a
> >>>>>N(0,0,001) random variable
> >>>>>I need say 100 values.
> >>>>>
> >>>>>How do I do this?
> >>>>>     
> >>>>>
> >>>>>          
> >>>>>
> >>>>I assume X(t) and x(t) are the same (?).
> >>>>
> >>>>f<-function (x) { 3.8*x*(1-x) + rnorm(1,0,.001) }
> >>>>v=c()
> >>>>x=.1 # starting point
> >>>>for (i in 1:100) { x=f(x); v=append(v,x) }
> >>>>
> >>>>There may be smarter ways...
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>Yes, but the only really crucial one is to avoid the 
> inefficient append  by
> >>>preallocating the v: 
> >>>
> >>>v <- numeric(100)
> >>>x <- .1 ; for (i in 1:100) { x <- f(x); v[i] <- x }
> >>>
> >>>apart from that you can use implicit loops:
> >>>
> >>>x <- .1 ; v <- sapply(1:100, function(i) x <<- f(x))
> >>>
> >>>or
> >>>
> >>>z <- .1 ; v <- replicate(100, z <<- f(z))
> >>>
> >>>(You cannot use x there because of a variable capture 
> issue which is a
> >>>bit of a bug. I intend to fix it for 1.9.0.)
> >>>
> >>> 
> >>>
> >>>      
> >>>
> >>______________________________________________
> >>R-help at stat.math.ethz.ch mailing list
> >>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> >>PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> >>
> >>
> >>    
> >>
> >
> 
> >  
> >
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 


------------------------------------------------------------------------------
Notice:  This e-mail message, together with any attachments,...{{dropped}}




More information about the R-help mailing list