[R] Speedups with Ra and jit
Luke Tierney
luke at stat.uiowa.edu
Sun May 4 20:18:21 CEST 2008
A couple of comments on this and the original thread.
As pointed out by several posters, in a vectorized language like R one
can usually create the fastest and cleanest code by using vectorized
operations. This applies to R as well as Matlab.
That said, there are at times reasons for using code consisting of a
set of tight nested loops, such as
it may not be obvious how to rewrite the loops using
vectorization
the code may match a published algorithm and so verifying its
correctness and maintaining it in this form may be easier than
rewriting
the code may not be fast or pretty but it works and gets the right
answer
Given these points it would be nice if R's interpreter could do a
better job. R's interpreter is currently rather slow on these sorts
of computations. For example, a direct translation of the original
poster's grw_permute example runs about 5 times as fast in the
xlispstat interpreter (similar internals in many ways). There are
some things we could do involving relatively little effort to
marginally improve R's interpreter, but significant improvements would
likely require a fairly significant rewrite (which may be worth
considering at some point but probably not soon).
An alternative is to pursue automated tools to transform this sort of
code into faster code. This is in part the motivation for creating a
byte code compiler for R, and for the Ra/jit project. The current
byte code compiler (http://www.stat.uiowa.edu/~luke/R/compiler/)
speeds up this computation by about a factor of 3. This compiler does
not yet optimize variable lookup or indexing of vectors or matrices;
these should be added sometime this summer and should add another
factor of 3 or so for this example.
Steve has done some very interesting things in his Ra/jit project.
Some of the optimizations done there are either already done in the
byre code compiler or have been planned to be added for a while.
Others, in particular specialization for basic data types may be best
done at run time as jit does, and there may be room for merging these
ideas with the static byte code compiler.
The particular specialization approach used in jit means that some
code will produce different results or generate errors; i.e. a user
who requests jit compilation is implicitly agreeing not to try to do
certain things, such as change types or sizes of values stored in
variables used in the compilation. In the long run I would prefer
either a mechanism where such assumptions are declared explicitly by
the user or to arrange for R to automatically switch back to less
optimized code when the assumptions of the optimization are violated.
I believe the main aspect of runtime specialization done in jit now
that may be hard to match in statically compiled byte code is that
a function defined as
f <- function(x) {
jit()
s <- 0
for (i in seq_along(x)) s <- s + x[i]
s
}
will be optimized for integer data when run on integer x and on for
real data when run on real x, and in both cases allocation of
intermediate results is avoided. How valuable this is in the long run
is not yet clear -- it would definitely be very helpful if machine
code was being generated that also allowed intermediate values to stay
in registers (which I believe is what psyco does), but that is messy
and hard to do across many platforms. With fast allocation the
benefits of avoiding allocation alone may not be that substantial.
For example, the byte compiled xlispstat version of the grw_protect
example mentioned above runs about twice as fast at the Ra/jit one
without avoiding intermediate allocation. This isn't conclusive of
course and it will be interesting to do some more careful tests and
see what directions those suggest.
Best,
luke
On Fri, 2 May 2008, Bill.Venables at csiro.au wrote:
> The topic of Ra and jit has come up on this list recently
>
> (see http://www.milbo.users.sonic.net/ra/index.html)
>
> so I thought people might be interested in this little demo. For it I
> used my machine, a 3-year old laptop with 2Gb memory running Windows
> XP, and the good old convolution example, the same one as used on the
> web page, (though the code on the web page has a slight glitch in it).
>
> This is using Ra with R-2.7.0.
>
>> conv1 <- function(a, b) {
>> ### with Ra and jit
> require(jit)
> jit(1)
> ab <- numeric(length(a)+length(b)-1)
> for(i in 1:length(a))
> for(j in 1:length(b))
> ab[i+j-1] <- ab[i+j-1] + a[i]*b[j]
> ab
> }
>>
>> conv2 <- function(a, b) {
>> ### with just Ra
> ab <- numeric(length(a)+length(b)-1)
> for(i in 1:length(a))
> for(j in 1:length(b))
> ab[i+j-1] <- ab[i+j-1] + a[i]*b[j]
> ab
> }
>>
>> x <- 1:2000
>> y <- 1:500
>> system.time(tst1 <- conv1(x, y))
> user system elapsed
> 0.53 0.00 0.55
>> system.time(tst2 <- conv2(x, y))
> user system elapsed
> 9.49 0.00 9.56
>> all.equal(tst1, tst2)
> [1] TRUE
>>
>> 9.56/0.55
> [1] 17.38182
>>
>
> However for this example you can achieve speed-ups like that or better
> just using vectorised code intelligently:
>
>> conv3 <- local({
> conv <- function(a, b, na, nb) {
> r <- numeric(na + nb -1)
> ij <- 1:nb
> for(e in a) {
> r[ij] <- r[ij] + e*b
> ij <- ij + 1
> }
> r
> }
>
> function(a, b) {
> na <- length(a)
> nb <- length(b)
> if(na < nb) conv(a, b, na, nb) else
> conv(b, a, nb, na)
> }
> })
>>
>> system.time(tst3 <- conv3(x, y))
> user system elapsed
> 0.11 0.00 0.11
>> all.equal(tst1, tst3)
> [1] TRUE
>
>> 0.55/0.11
> [1] 5
>> 9.56/0.11
> [1] 86.90909
>
> ie. a further 5-fold increase in speed, or about 87 times faster than
> the unassisted naïve code.
>
> I think the lesson here is if you really want to write R code as you
> might C code, then jit can help make it practical in terms of time.
> On the other hand, if you want to write R code using as much of the
> inbuilt operators as you have, then you can possibly still do things
> better.
>
> Of course sometimes you don't have the right inbuilt operators. In
> that case you have a three-way choice: slow R code and wait, faster R
> code speeded up with Ra and jit, or, (the way it probably should be
> done), with dynamically loaded C or Fortran code. Portability
> decreases as you go, of course.
>
>
> Bill Venables
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke at stat.uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
More information about the R-help
mailing list