[R] speed issue in simulating a stochastic process
William Dunlap
wdunlap at tibco.com
Thu Nov 6 23:22:15 CET 2014
Loops are not slow, but your code did a lot of unneeded operations in each
loop.
E.g, you computed
D$id==i & D$t==t
for each row of D. That involves 2*nrow(D) equality tests for each of the
nrow(D)
rows, i.e., it is quadratic in N*T.
Then you did a data.frame replacement operation
D[k,]$y <- newValue
where k is D$id==1&D$t==t. This extracts the k'th row of D, then extracts
the 1-row 'y' column
from it, replaces it with the new value, then puts that row back into D.
If you must use
a data.frame, the equivalent
D$y[k] <- newValue
is probably much faster (data.frames are lists of columns, so replacing a
column is fast).
Using a matrix to organize things is less flexible, but faster because you
don't have to search
when you want to find the element for a given id and time - you just do a
little arithmetic to
get the offset from the start of the matrix.
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Thu, Nov 6, 2014 at 2:05 PM, Matteo Richiardi <matteo.richiardi at gmail.com
> wrote:
> Hi William,
> that's super. Thanks a lot. I knew that R is slow with loops, but did not
> imagine so slow! B.t.w., what's the reason?
> Final question: in your code you have mean(M[t-1L,]): what is the 'L'
> for? I removed it at apparently the code produces the same output...
>
> Thanks again,
> Matteo
>
> On 6 November 2014 18:46, William Dunlap <wdunlap at tibco.com> wrote:
>
>> I find that representing the simulated data as a T row by N column matrix
>> allows for a clearer and faster simulation function. E.g., compare the
>> output of the following two functions, the first of which uses your code
>> and the second a matrix representation (which I convert to a data.frame at
>> the end so I can compare outputs easily). I timed both of them for T=10^3
>> times and N=50 individuals; both gave the same results and f1 was 10000
>> times faster than f0:
>> > set.seed(1); t0 <- system.time(s0 <- f0(N=50,T=1000))
>> > set.seed(1); t1 <- system.time(s1 <- f1(N=50,T=1000))
>> > rbind(t0, t1)
>> user.self sys.self elapsed user.child sys.child
>> t0 436.87 0.11 438.48 NA NA
>> t1 0.04 0.00 0.04 NA NA
>> > all.equal(s0, s1)
>> [1] TRUE
>>
>> The functions are:
>>
>> f0 <- function(N = 20, T = 100, lambda = 0.1, m_e = 0, sd_e = 1, y0 = 0)
>> {
>> # construct the data frame and initialize y
>> D <- data.frame(
>> id = rep(1:N,T),
>> t = rep(1:T, each = N),
>> y = rep(y0,N*T)
>> )
>>
>> # update y
>> for(t in 2:T){
>> ybar.L1 = mean(D[D$t==t-1,"y"])
>> for(i in 1:N){
>> epsilon = rnorm(1,mean=m_e,sd=sd_e)
>> D[D$id==i & D$t==t,]$y = lambda*y0+(1-lambda)*ybar.L1+epsilon
>> }
>> }
>> D
>> }
>>
>> f1 <- function(N = 20, T = 100, lambda = 0.1, m_e = 0, sd_e = 1, y0 = 0)
>> {
>> # same process simulated using a matrix representation
>> # The T rows are times, the N columns are individuals
>> M <- matrix(y0, nrow=T, ncol=N)
>> if (T > 1) for(t in 2:T) {
>> ybar.L1 <- mean(M[t-1L,])
>> epsilon <- rnorm(N, mean=m_e, sd=sd_e)
>> M[t,] <- lambda * y0 + (1-lambda)*ybar.L1 + epsilon
>> }
>> # convert to the data.frame representation that f0 uses
>> tM <- t(M)
>> data.frame(id = as.vector(row(tM)), t = as.vector(col(tM)), y =
>> as.vector(tM))
>> }
>>
>>
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>> On Thu, Nov 6, 2014 at 6:47 AM, Matteo Richiardi <
>> matteo.richiardi at gmail.com> wrote:
>>
>>> I wish to simulate the following stochastic process, for i = 1...N
>>> individuals and t=1...T periods:
>>>
>>> y_{i,t} = y_0 + lambda Ey_{t-1} + epsilon_{i,t}
>>>
>>> where Ey_{t-1} is the average of y over the N individuals computed at
>>> time
>>> t-1.
>>>
>>> My solution (below) works but is incredibly slow. Is there a faster but
>>> still clear and readable alternative?
>>>
>>> Thanks a lot. Matteo
>>>
>>> rm(list=ls())
>>> library(plyr)
>>> y0 = 0
>>> lambda = 0.1
>>> N = 20
>>> T = 100
>>> m_e = 0
>>> sd_e = 1
>>>
>>> # construct the data frame and initialize y
>>> D = data.frame(
>>> id = rep(1:N,T),
>>> t = rep(1:T, each = N),
>>> y = rep(y0,N*T)
>>> )
>>>
>>> # update y
>>> for(t in 2:T){
>>> ybar.L1 = mean(D[D$t==t-1,"y"])
>>> for(i in 1:N){
>>> epsilon = rnorm(1,mean=m_e,sd=sd_e)
>>> D[D$id==i & D$t==t,]$y = lambda*y0+(1-lambda)*ybar.L1+epsilon
>>> }
>>> }
>>>
>>> ybar <- ddply(D,~t,summarise,mean=mean(y))
>>>
>>> plot(ybar, col = "blue", type = "l")
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list