[R] How to make R running faster
Robert A LaBudde
ral at lcfltd.com
Wed May 28 17:56:31 CEST 2008
At 10:25 AM 5/28/2008, Esmail Bonakdarian wrote:
>Erin Hodgess wrote:
>>I remember reading the colSum and colMean were better, when you need
>>sums and means
>
>Well .. I'm waiting for the experts to jump in and give us the
>straight story on this :-)
All of the algorithms are represented internally by sequential
program logic using C or Fortran, for example. So the issue isn't the
algorithm itself. Instead, it's where the algorithm is implemented.
However, R is an interpreter, not a compiler. This means that it
reads each line of R code one character at a time to develop an
understanding of what is desired done, and to check for errors in
syntax and data classes. Interpreters are very slow compared to
compiled code, where the lines have been pre-interpreted and already
converted to machine code with error checking resolved.
For example a simple "for" loop iteration might take only 0.1
microsecond in a compiled program, but 20-100 microseconds in an
interpreted program.
This overhead of parsing each line can be bounded by function calls
inside each line. If the compiled function executes on a large number
of cases in one call, then the 50 microsecond overhead per call is diluted out.
R is a parallel processing language. If you use vectors and arrays
and the built-in (i.e., compiled) function calls, you get maximum use
of the compiled programs and minimum use of the interpreted program.
This is why functions such as colMeans() or apply() are faster than
writing direct loops in R. You can speed things up by 200-1000x for
large arrays.
Interpreted languages are very convenient to use, as they do instant
error checking and are very interactive. No overhead of learning and
using compilers and linkers. But they are very slow on complex
calculations. This is why the array processing is stuffed into
compiled functions. The best of both worlds then.
Interpreted languages are Java, R, MatLab, Gauss and others. Compiled
languages are C and Fortran. Some, like variants of BASIC, can be
interpreted, line-compiled or compiled, depending upon
implementation. Some compiled languages (such as Fortran), can allow
parallel processing via multiprocessing on multiple CPUs, which
speeds things up even more. Compiled languages also typically
optimize code for the target machine, which can speed things up a
factor of 2 or so.
So the general rule for R is: If you are annoyed at processing time,
alter your program to maximize calculations within compiled functions
(i.e., "vectorize" your program to process an entire array at one
time) and minimize the number of lines of R.
================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: ral at lcfltd.com
Least Cost Formulations, Ltd. URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239 Fax: 757-467-2947
"Vere scire est per causas scire"
More information about the R-help
mailing list