[R] Methods stopping calculations when a first element is found? (was: Is R the right choice for simulating first passage times of random walks?)
Paul Menzel
paulepanter at users.sourceforge.net
Sun Jul 31 20:34:21 CEST 2011
Dear responders,
thank you very much for all your answers, suggestions and corrections. I
will try to give feedback and share my findings.
Am Donnerstag, den 28.07.2011, 01:56 +0000 schrieb William Dunlap:
[…]
> You might try using sample(c(-1,1), size=length, replace=TRUE)
> instead of sign(rnorm(length)). I don't know if there would
> be any speed difference, but it frees you from the worry that
> rnorm() might return an exact 0.0.
That is a very helpful suggestions. Additionally I was also told to just
use integers to reduce the memory usage [1].
First of all I have to admit and apologize, that the example code I
provided did count all passing of the x-axis from negativ to positive
and did not do what I intended. I only wanted to count how long the
random walk stays negative. As a result I can jump out of the for loop,
which is the first element in half of the cases for the simple random
walk, and now trying to implement the suggested methods using the for
loop seems quicker and I wonder if R provides methods to just search for
a “first” element.
-------- 8< -------- code -------- >8 --------
f1 <- function(n = 100000, # number of simulations
length = 100000) # length of iterated sum
{
z = integer(length/2 + 1) # simple random walk only gets 0 on even timings
for (j in 1:n) {
x = sample(c(-1L, 1L), size=length, replace=TRUE)
if (x[1] >= 0 ) {
z[1] = z[1] + 1L
next
}
s = cumsum(x)
for (i in 1:length-1) {
if (s[i] < 0 && s[i + 1] >= 0) {
z[i/2 + 2] = z[i/2 + 2] + 1L # account for that we get positiv right away, that means a duration of 0
break
}
}
}
z
}
f2 <- function(n = 100000, # number of simulations
length = 100000) # length of iterated sum
{
z = integer(length/2 + 1) # simple random walk only gets 0 on even timings
for (i in 1:n) {
x = c(sample(c(-1L, 1L), size=length, replace=TRUE))
if (x[1] >= 0 ) {
z[1] = z[1] + 1L
next
}
s = cumsum(x)
minidx = which.max( c(s[-length]<0 & s[-1]>=0, TRUE) ) # The last TRUE is there to fix the case when the random walk stays negative the whole time. \
# Everything would be `FALSE` then and `which.max()` returns `1`.
if (minidx != length) {
z[minidx/2 + 2] <- z[minidx/2 + 2] + 1L # account for that we get positiv right away, that means a duration of 0
}
}
z
}
-------- 8< -------- code -------- >8 --------
The code results in the following timings.
> set.seed(1) ; system.time( z1 <- f1(1000, 1e5) )
User System verstrichen
13.421 0.300 13.946
> set.seed(1) ; system.time( z2 <- f2(1000, 1e5) )
User System verstrichen
27.281 0.616 28.010
> identical(z1, z2)
As written above, using the for loops we can abort further calculation
by jumping out of the loop. The R methods always compare all values and
I had to adapt the trick using TRUE and FALSE when the walk stays
negative all the time.
Can any of you experienced users think of an idea to get rid of the
inner loop? I cannot think of quicker matrix operations as done in the
Wiki for a use case [2].
I will try to do some timings to get rid of the outer loop using
matrices as suggested before. But I have to split it up further to avoid
memory issues [1].
Thanks,
Paul
[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=635832#10
[2] http://rwiki.sciviews.org/doku.php?id=tips:programming:code_optim2
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110731/084dc885/attachment.bin>
More information about the R-help
mailing list