[R] Is R the right choice for simulating first passage times of random walks?
Paul Menzel
paulepanter at users.sourceforge.net
Tue Aug 2 13:41:12 CEST 2011
Dear Dennis and Steve,
Am Sonntag, den 31.07.2011, 23:32 -0400 schrieb Steve Lianoglou:
[…]
> How about trying to write the of this `f4` function below using the
> rcpp/inline combo. The C/C++ you will need to write looks to be quite
> trivial, let's change f4 to accept an x argument as a vector:
>
> I've defined f4 in the same way as Dennis did:
>
> > f4 <- function()
> > {
> > x <- sample(c(-1L,1L), 1)
> >
> > if (x >= 0 ) {return(1)} else {
> > csum <- x
> > len <- 1
> > while(csum < 0) {
> > csum <- csum + sample(c(-1, 1), 1)
> > len <- len + 1
> > } }
> > len
> > }
>
> Now, let's do some inline/c++ mojo:
>
> library(inline)
> inc <- "
> #include <stdio.h>
> #include <stdlib.h>
> #include <time.h>
> "
>
> fxx <-cxxfunction(includes=inc, plugin="Rcpp", body="
> int len = 1;
> int x = ((rand() % 2 ) == 0) ? 1 : -1;
> int csum = x;
>
> while (csum < 0) {
> x = ((rand() % 2 ) == 0) ? 1 : -1;
> len++;
> csum = csum + x;
> }
>
> return wrap(len);
> ")
>
> Assuming I've faithfully translated this into c++, the timings aren't
> all that comparable.
>
> Doing 500 replicates with the pure R version:
>
> set.seed(123)
> system.time(out <- replicate(500, f4()))
> user system elapsed
> 31.525 0.120 32.510
>
> Doing 10,000 replicates using the fxx function doesn't even break a sweat:
>
> system.time(outxx <- replicate(10000, fxx()))
> user system elapsed
> 0.371 0.001 0.373
>
> range(out)
> [1] 1 1994308
>
> range(outxx)
> [1] 1 11909394
thank you very much for your suggestions.
This is indeed a nice speed.
1. I first had that implemented in FORTRAN (and Python) too, but turned
to R for two reasons. First I wanted to use also other distributions
later on and thought that it would be easier with R and that R would
have that implemented as fast as possible. Secondly I thought that R
would also operate faster having the right vectorization and using
`csum()`. But I guess it is difficult to find a good model to use the
advantages of R.
Especially looking at `top` when running this example CPU is used 100 %
but memory only 40 MB from 2 GB. So if one could use another data
structure maybe the calculations could be done on more walks at once.
2. It is indeed possible that the walk never returns to zero, so I
should make sure, that I abort the while loop after a certain length.
3. Looking at the data types I am wondering if some integer overflow(?)
could happen. I could make the length variable unsigned I suppose [1].
But still `csum` could go from `-len` to 0 and for the normal random
walk unsigned should not be a problem too besides that the logic/checks
have to be adapted.
For integrated random walks, `ccsum += csum`, `ccsum` would go from
-(ccsum**2)/2 up to 0. So later on I should use probably the 64 bit data
type (unsigned) `long` for `ccsum`, `csum` and `length` to avoid those
problems. Memory does not seem to be a problem. Also I need to add an
additional check for the height and length in the while loop like the
following.
(csum < 0) && (csum > -ULONG_MAX) && (len =< ULONG_MAX)
So I came up with the following and to use unsigned I only consider that
the random walk stays positive instead of negative.
-------- 8< -------- code -------- >8 --------
library(inline)
inc <- "
#include <climits>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
"
f9 <-cxxfunction(includes=inc, plugin="Rcpp", body="
unsigned long len = 1;
if ((rand() % 2 ) == 0) {
return wrap(len);
}
unsigned long x = 1;
for (unsigned long csum = x; csum > 0; csum = ((rand() % 2 ) == 0) ? csum + 1: csum - 1) {
len++;
if ((csum == ULONG_MAX) && (len == ULONG_MAX)) {
return wrap(len);
}
}
return wrap(len);
")
-------- 8< -------- code -------- >8 --------
I do not know if the compiler would have optimized it that way anyway
and if there is any difference (besides the overflow checks).
> set.seed(1); system.time( z9_1 <- replicate(1000, f9()) )
User System verstrichen
0.076 0.004 0.084
> range(z9_1)
[1] 1 1449034
> length(z9_1)
[1] 1000
Thanks,
Paul
[1] https://secure.wikimedia.org/wikipedia/en/wiki/Integer_(computer_science)#Common_integral_data_types
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110802/378b9b6b/attachment.bin>
More information about the R-help
mailing list