[R] Is R the right choice for simulating first passage times of random walks?
Paul Menzel
paulepanter at users.sourceforge.net
Mon Aug 1 14:49:42 CEST 2011
Am Sonntag, den 31.07.2011, 23:32 -0500 schrieb R. Michael Weylandt :
> Glad to help -- I haven't taken a look at Dennis' solution (which may be far
> better than mine), but if you do want to keep going down the path outlined
> below you might consider the following:
I will try Dennis’ solution right away but looked at your suggestions
first. Thank you very much.
> Instead of throwing away a simulation if something starts negative, why not
> just multiply the entire sample by -1: that lets you still use the sample
> and saves you some computations: of course you'll have to remember to adjust
> your final results accordingly.
That is a nice suggestion. For a symmetric random walk this is indeed
possible and equivalent to looking when the walk first hits zero.
> This might avoid the loop:
>
> x = ## Whatever x is.
> xLag = c(0,x[-length(x)]) # 'lag' x by 1 step.
> which.max((x>=0) & (xLag <0)) + 1 # Depending on how you've decided to count
> things, this +1 may be extraneous.
>
> The inner expression sets a 0 except where there is a switch from negative
> to positive and a one there: the which.max function returns the location of
> the first maximum, which is the first 1, in the vector. If you are
> guaranteed the run starts negative, then the location of the first positive
> should give you the length of the negative run.
That is the same idea as from Bill [1]. The problem is, when the walk
never returns to zero in a sample, `which.max(»everything FALSE)`
returns 1 [2]. That is no problem though, when we do not have to worry
about a walk starting with a positive value and adding 1 (+1) can be
omitted when we count the epochs of first hitting 0 instead of the time
of how long the walk stayed negative, which is always one less.
Additionally my check `(x>=0) & (xLag <0)` is redundant when we know we
start with a negative value. `(x>=0)` should be good enough in this
case.
> This all gives you,
>
> f4 <- function(n = 100000, # number of simulations
> length = 100000) # length of iterated sum
> {
> R = matrix(sample(c(-1L,1L), length*n,replace=T),nrow=n)
>
> > R = apply(R,1,cumsum)
> >
> R[R[,1]==(1),] = -1 * R[R[,1]==(-1),] # If the first element in the row is positive, flip the entire row
The line above seems to look the columns instead of rows. I think the
following is correct since after the `apply()` above the random walks
are in the columns.
R[,R[1,]==(1)] = -1 * R[,R[1,]==(1)]
> > fTemp <- function(x) {
> >
> xLag = c(0,x[-length(x)])
> return(which.max((x>=0) & (xLag <0))+1)
>
> > countNegative = apply(R,2,fTemp)
> > tabulate(as.vector(countNegative), length)
> > }
>
> That just crashed my computer though, so I wouldn't recommend it for large
> n,length.
Welcome to my world. I would have never thought that simulating random
walks with a length of say a million would create that much data and
push common desktop systems with let us say 4 GB of RAM to their limits.
> Instead, you can help a little by combining the lagging and the &
> all in one.
>
> f4 <- function(n = 100000, llength = 100000)
> {
> R = matrix(sample(c(-1L,1L), length*n,replace=T),nrow=n)
> R = apply(R,1,cumsum)
> R[R[,1]==(1),] = -1 * R[R[,1]==(-1),] # If the first element in the row is positive, flip the entire row
> R = (cbind(rep(0,NROW(R)),R)<0)&(cbind(R,rep(0,NROW(R)))>=0)
> countNegative = apply(R,1,which.max) + 1
> return (tabulate(as.vector(countNegative), length) )
>
> > }
I left that one out, because as written above the check can be
shortened.
> Of course, this is all starting to approach a very specific question that
> could actually be approached much more efficiently if it's your end goal
> (though I think I remember from your first email a different end goal):
That is true. But to learn some optimization techniques on a simple
example is much appreciated and will hopefully help me later on for the
iterated random walk cases.
> We can use the symmetry and "restart"ability of RW to do the following:
>
> x = cumsum(sample(c(-1L,1L),BIGNUMBER,replace=T)
> D = diff(which(x == 0))
Nice!
> This will give you a vector of how long x stays positive or negative at a
> time. Thinking through some simple translations lets you see that this set
> has the same distribution as how long a RW that starts negative stays
> negative.
I have to write those translations down. On first sight though we need
again to handle the case where it stays negative the whole time. `D`
then has length 0 and we have to count that for a walk longer than
`BIGNUMBER`.
> Again, this is only good for answering a very specific question
> about random walks and may not be useful if you have other more complicated
> questions in sight.
Just testing for 0 for the iterated cases will not be enough for
iterated random walks since an iterated random walk can go from negative
to non-negative without being zero at this time/epoch.
I implemented all your suggestions and got the following.
-------- 8< -------- code -------- >8 --------
f4 <- function(n = 100000, # number of simulations
length = 100000) # length of iterated sum
{
R = matrix(sample(c(-1L,1L),length*n,replace=T),nrow=n)
R = apply(R,1,cumsum) ## this applies cumsum `row-wise' to R and will make your life INFINITELY better
fTemp <- function(x) {
if (x[1] >= 0 ) {
return(1)
}
for (i in 1:length-1) {
if (x[i] < 0 && x[i + 1] >= 0) {
return(as.integer(i/2 + 2))
}
}
}
countNegative = apply(R,2,fTemp)
tabulate(as.vector(countNegative), length)
}
f5 <- function(n = 100000, # number of simulations
length = 100000) # length of iterated sum
{
R = matrix(sample(c(-1L,1L), length*n,replace=T),nrow=n)
R = apply(R,1,cumsum)
R[,R[1,]==(1)] = -1 * R[,R[1,]==(1)] # If the first element in the row is positive, flip the entire row
R <- R>=0
countNegative = apply(R,2,which.max)
tabulate(as.vector(countNegative), length)
}
f6 <- function(n = 100000, # number of simulations
length = 100000) # length of iterated sum
{
x = cumsum(sample(c(-1L,1L), length*n,replace=T))
D = diff(which(c(0, x) == 0))
tabulate(D, max(D))
}
-------- 8< -------- code -------- >8 --------
The timings differ quite much which is expected though.
> # f1 is using only for loops but only does half the calculations
> # and does not yet flip random walks starting with a positive value.
> set.seed(1) ; system.time( z1 <- f1(300, 1e5) )
User System verstrichen
2.700 0.008 2.729
> # f1 adapted with flips
> set.seed(1) ; system.time( z1f <- f1withflip(300, 1e5) )
User System verstrichen
4.457 0.004 4.475
> set.seed(1) ; system.time( z4 <- f4(300, 1e5) )
User System verstrichen
8.033 0.380 8.739
> set.seed(1) ; system.time( z5 <- f5(300, 1e5) )
User System verstrichen
9.640 0.812 10.588
> set.seed(1) ; system.time( z6 <- f6(300, 1e5) )
User System verstrichen
4.208 0.328 4.606
So `f6` seems to be the most efficient setting right now and even is
slightly faster than `f1` with the for loops. But we have to keep in
mind that both operate on different data sets although `set.seed(1)` is
used and `f6` treats the problem totally different.
One other thought is that when reusing the walks starting with a positiv
term and flipping those we can probably also take the backward/reverse
walk (dual problem). I will try that too.
Thank you very much,
Paul
[1] https://stat.ethz.ch/pipermail/r-help/2011-July/285015.html
[2] https://stat.ethz.ch/pipermail/r-help/2011-July/285396.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110801/d4411d79/attachment.bin>
More information about the R-help
mailing list