[Rd] Package compiler - efficiency problem
Tomas Kalibera
tom@@@k@liber@ @ending from gm@il@com
Mon Aug 13 18:02:23 CEST 2018
Dear Karol,
thank you for the report. I can reproduce that the function from you
example takes very long to compile and I can see where most time is
spent. The compiler is itself written in R and requires a lot of
resources for large functions (foo() has over 16,000 lines of code,
nearly 1 million of instructions/operands, 45,000 constants). In
particular a lot of time is spent in garbage collection and in finding a
unique set of constants. Some optimizations of the compiler may be
possible, but it is unlikely that functions this large will compile fast
any soon. For non-generated code, we now have the byte-compilation on
installation by default which at least removes the compile overhead from
runtime. Even though the compiler is slow, it is important to keep in
mind that in principle, with any compiler there will be functions where
compilation would not be improve performance (when the compile time is
included or not).
I think it is not a good idea to generate code for functions like foo()
in R (or any interpreted language). You say that R's byte-code compiler
produces code that runs 5-10x faster than when the function is
interpreted by the AST interpreter (uncompiled), which sounds like a
good result, but I believe that avoiding code generation would be much
faster than that, apart from drastically reducing code size and
therefore compile time. The generator of these functions has much more
information than the compiler - it could be turned into an interpreter
of these functions and compute their values on the fly.
A significant source of inefficiency of the generated code are
element-wise operations, such as
r[12] <- -vv[88] + vv[16] * (1 + ppff[1307])
...
r[139] <- -vv[215] + vv[47] * (1 + ppff[1434])
(these could be vectorized, which would reduce code size and improve
interpretation speed; and make it somewhat readable). Most of the code
lines in the generated functions seem to be easily vectorizable.
Compilers and interpreters necessarily use some heuristics or optimize
at some code patterns. Optimizing for generated code may be tricky as it
could even harm performance of usual code. And, I would much rather
optimize the compiler for the usual code.
Indeed, a pragmatic solution requiring the least amount of work would be
to disable compilation of these generated functions. There is not a
documented way to do that and maybe we could add it (and technically it
is trivial), but I have been reluctant so far - in some cases,
compilation even of these functions may be beneficial - if the speedup
is 5-10x and we run very many times. But once the generated code
included some pragma preventing compilation, it won't be ever compiled.
Also, the trade-offs may change as the compiler evolves, perhaps not in
this case, but in other where such pragma may be used.
Well so the short answer would be that these functions should not be
generated in the first place. If it were too much work rewriting,
perhaps the generator could just be improved to produce vectorized
operations.
Best
Tomas
On 12.8.2018 21:31, Karol Podemski wrote:
> Dear R team,
>
> I am a co-author and maintainer of one of R packages distributed by R-forge
> (gEcon). One of gEcon package users found a strange behaviour of package (R
> froze for couple of minutes) and reported it to me. I traced the strange
> behaviour to compiler package. I attach short demonstration of the problem
> to this mail (demonstration makes use of compiler and tictoc packages only).
>
> In short, the compiler package has problems in compiling large functions -
> their compilation and execution may take much longer than direct execution
> of an uncompiled function. Such functions are generated by gEcon package as
> they describe steady state for economy.
>
> I am curious if you are aware of such problems and plan to handle the
> efficiency issues. On one of the boards I saw that there were efficiency
> issues in rpart package but they have been resolved. Or would you advise to
> turn off JIT on package load (package heavily uses such long functions
> generated whenever a new model is created)?
>
> Best regards,
> Karol Podemski
>
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
[[alternative HTML version deleted]]
More information about the R-devel
mailing list