[R] long algo

Thu Oct 30 23:55:38 CET 2003

"Alessandro Semeria" alessandro.semeria at cramont.it wrote:
	Is well know that R is inefficent  on loops.

This is a dangerous half-truth.  R is an interpreted language.
The interpreter uses techniques similar to those used in Scheme
interpreters.  As interpreters go, it's pretty good.  For comparison,
in processing XML documents, I've had interpreted Scheme running rings
around compiled Java (by doing the task a different way, of course).
Also for comparison, years ago I had a Prolog program for median
polish that made a published Fortran program for median polish look
sick (by using a much better data structure).  With Luke Tierney's
byte-code compiler, I expect R loops will become close to as efficient
as Python ones, and people run entire web sites with Python.

It is more accurate to say that R code qua R code is not as efficient
as the large body of "primitives" that operate on entire arrays.

	When you have to perform "heavy" loop
	is better to use a call to fortran or c code (.Fortran() , .C() functions)

Even if the premiss were literally and exactly true, the conclusion
would not follow.  When you have a speed problem with R code,

(1) Find out where the problem is, exactly.  People's intuition about
    performance bottlenecks is notoriously bad.  Do what the experts do:
    *measure*.
(2) Try to restructure the code *entirely in R* to be as clear and high
    level as possible.  If there have to be subscripts, at least let them
    be vector subscripts.
(3) Measure again.  Chances are that making the code clear and high level
    has fixed the performance problem.
(4) If that fails, try restructuring the code a couple of ways,
    *entirely in R*.  The two basic techniques for optimising a calculation
    are (a) eliminate it entirely and (b) if you can't eliminate the first
    evaluation of an expression, eliminate the second by saving the result.
    As a special case of (b), try moving things out of loops; try splitting
    a calculation into a part that changes a lot and a part that changes
    very little, and update the small-change part only when you have to.
    Perhaps apply the idea of program differentiation.  (NOT the idea of
    taking a function that computes a value and automatically computing
    a function that computes the derivative of the first, but the idea of
    saying if I have z<-f(x,y) and I make a small change to x, do I have
    to recompute z completely or can I came a small change to z?)
    Try to use built in operations as much as possible on data structures
    that are as large as appropriate.
(5) Measure again.  This will probably have fixed the performance problem.
(6) If all else fails, now it's time to try Fortran or C.  It's too bad
    there isn't an existing Fortran or C module you can just call, if there
    had been you'd have used that before writing the original R code.