[R] applying data generating function

Mon Mar 8 05:15:41 CET 2004

Hi, Gabor: 

      Thanks for the "garbage collection" suggestion.  In this case, I 
can't imagine how it would change the results:  I developed the script 
in an S-Plus script window, then copied it into an R session that had 
recently just been started.  Moreover, the times generally declined upon 
replication.  Do you think the time might INCREASE after "gc"? 

      Best Wishes,
      spencer graves

Gabor Grothendieck wrote:

>Regarding your comment on speed varying when replicating the
>runs, try running gc() first.
>
>---
>Date:   Sun, 07 Mar 2004 17:56:46 -0800 
>From:   Spencer Graves <spencer.graves at pdf.com>
>To:   Peter Dalgaard <p.dalgaard at biostat.ku.dk> 
>Cc:   Fred J. <phddas at yahoo.com>,r-help <r-help at stat.math.ethz.ch> 
>Subject:   Re: [R] applying data generating function 
>
> 
>Peter's enumeration of alternatives inspired me to compare compute 
>times for N = 10^(2:5), with the following results: 
>
>*** R 1.8.1 under Windows 2000, IBM Thinkpad T30: 
>10 100 1000 10000 1e+05
>for loop 0 0.01 0.09 1.27 192.05
>gen e + for loop 0 0.00 0.03 0.22 2.58
>create storage + for loop 0 0.01 0.05 0.34 3.45
>sapply 0 0.00 0.04 0.28 3.82
>replicate 0 0.01 0.05 0.29 4.02
>
>I repeated this with the "for loop" both first and last. The 
>times tended to decline on replication, with the "for loop" time for N = 
>1e5 = 182.02, 126.04 (with the "for loop" last), 130.30 ("for loop" 
>last), and 118.64 ("for loop" first again). 
>
>Conclusions: 
>
>(1) Apparently, in some cases, R picks up speed upon replication
>
>(2) The first 3 times for the "for loop" with N = 1e5 made me 
>wonder if there was an order effect, with the "for loop" being longer in 
>the first position. However, the last run with the "for loop" again 
>first had the shortest time of 118.64, contradicting that hypothesis. 
>
>By comparison, I also tried this under S-Plus 6.2: 
>
>*** S-Plus 6.2, Windows 2000, IBM Thinkpad T30 ("for loop" first): 
>10 100 1000 10000 100000
>for loop 0.01 0.05 0.331 3.976 273.073
>gen e + for loop 0.00 0.04 0.320 3.154 29.112
>create storage + for loop 0.01 0.03 0.231 2.113 22.242
>sapply 0.00 0.04 0.380 4.757 23.003
>
>The script I used appears below. As Peter said, "the only really 
>crucial [issue] is to avoid the inefficient append by preallocating" the 
>vectors to be generated. Moreover, this is only an issue for long loop, 
>with a threshold of between 1e4 and 1e5 in this example. For shorter 
>loops, the programmers' time is far more valuable. 
>
>Enjoy. spencer graves
>####################
>
>
>N.gen <- c(10, 100, 1000, 10000, 1e5)
>mtds <- c("for loop", "gen e + for loop", "create storage + for loop",
>"sapply", "replicate")
>m <- length(N.gen) 
>ellapsed.time <- array(NA, dim=c(m, length(mtds)))
>dimnames(ellapsed.time) <- list(N.gen, mtds)
>
>for(iN in 1:m){
>cat("\n", iN, "")
>N <- N.gen[iN]
>#for loop
>set.seed(123)
>start.time <- proc.time()
>f<-function (x.) { 3.8*x.*(1-x.) + rnorm(1,0,.001) }
>v=c()
>x=.1 # starting point
>for (i in 1:N) { x=f(x); v=append(v,x) }
>ellapsed.time[iN, "for loop"] <- (proc.time()-start.time)[3] 
>cat(mtds[1], "")
>
>#gen e + for loop
>set.seed(123)
>start.time <- proc.time()
>e <- 0.001*rnorm(N)
>X <- rep(0.1, N+1)
>for(i in 2:(N+1))
>X[i] <- (3.8*X[i-1]*(1-X[i-1])+e[i-1])
>ellapsed.time[iN, "gen e + for loop"] <- (proc.time()-start.time)[3]
>cat(mtds[2], "")
>
>#create storage + for loop 
>set.seed(123)
>start.time <- proc.time()
>V <- numeric(N)
>xv <- .1 ; for (i in 1:N) { xv <- f(xv); V[i] <- xv }
>ellapsed.time[iN, "create storage + for loop"] <- 
>(proc.time()-start.time)[3]
>cat(mtds[3], "")
>
>#sapply
>set.seed(123)
>start.time <- proc.time()
>xa <- .1 ; va <- sapply(1:N, function(i) xa <<- f(xa))
>ellapsed.time[iN, "sapply"] <- (proc.time()-start.time)[3] 
>cat(mtds[4], "")
>
>if(!is.null(version$language)){
>#replicate
>set.seed(123)
>start.time <- proc.time()
>z <- .1 ; vr <- replicate(N, z <<- f(z))
>ellapsed.time[iN, "replicate"] <- (proc.time()-start.time)[3]
>cat(mtds[5], "")
>}
>
>}
>
>t(ellapsed.time)
>#############################
>Peter Dalgaard wrote:
>
>  
>
>>Christophe Pallier <pallier at lscp.ehess.fr> writes:
>>
>>
>>
>>    
>>
>>>Fred J. wrote:
>>>
>>>
>>>
>>>      
>>>
>>>>I need to generate a data set based on this equation
>>>>X(t) = 3.8x(t-1) (1-x(t-1)) + e(t), where e(t) is a
>>>>N(0,0,001) random variable
>>>>I need say 100 values.
>>>>
>>>>How do I do this?
>>>>
>>>>
>>>>        
>>>>
>>>I assume X(t) and x(t) are the same (?).
>>>
>>>f<-function (x) { 3.8*x*(1-x) + rnorm(1,0,.001) }
>>>v=c()
>>>x=.1 # starting point
>>>for (i in 1:100) { x=f(x); v=append(v,x) }
>>>
>>>There may be smarter ways...
>>>
>>>
>>>      
>>>
>>Yes, but the only really crucial one is to avoid the inefficient append by
>>preallocating the v: 
>>
>>v <- numeric(100)
>>x <- .1 ; for (i in 1:100) { x <- f(x); v[i] <- x }
>>
>>apart from that you can use implicit loops:
>>
>>x <- .1 ; v <- sapply(1:100, function(i) x <<- f(x))
>>
>>or
>>
>>z <- .1 ; v <- replicate(100, z <<- f(z))
>>
>>(You cannot use x there because of a variable capture issue which is a
>>bit of a bug. I intend to fix it for 1.9.0.)
>>
>>
>>
>>    
>>
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>  
>