[R] How can I make my functions run faster

Mon Aug 19 19:22:36 CEST 2013

On 19-08-2013, at 18:24, William Dunlap <wdunlap at tibco.com> wrote:

>> I am a newbie too. I will share what I do normally for speeding up the code.
>> 
>> 1. Restrict defining too many variables (Global/ Local)
>> 2. Use apply functions (apply,sapply,lapply,tapply, etc.) whenever feasible
>> 3. Having multiple user defined functions doesn't help. Try to compact
>> everything in minimum number of functions
>> 4. The in-memory of R is just 10% of your total RAM (Correct me if wrong).
>> Make sure most of it is used for processing and not storing
> 
> (2) The apply functions do not run a whole lot faster than the equivalent loop
> and sometimes they run slower.  Their real advantage is that they make the
> code more understandable - you don't have to wade through the boilerplate
> of setting up a vector or matrix to return or worry about whether all the variables
> defined in the loop are temporaries or need to be retained for the rest of the
> function.  You can get big speedups by avoiding loops and apply functions and
> trying to vectorize the code instead (although sometimes vectorization leads
> to an unacceptable increase in memory usage).
> 
> (3) I think that splitting your problem up into several functions is helpful.  You
> can test each function on its own to check for correctness and speed.  You can
> can reuse and share your tested functions.  There is some overhead involved
> in calling a function but you have to make a lot of calls to notice.
> 
>     Make your functions "functional" - have almost all of the data come in via
> the argument list and have all of the output go out via its return value.  Without
> this functions are hard to understand and to test.  The OP's functions had a lot
> of <<-'s in them - get rid of them.  They also passed key variables like 'N' through
> the global environment (or some other environment, depending on how they
> were called).  Put that in the argument list so you can easily see how function using it
> (rspat) behaves as a function of N (it will be quadratic, bad if N will be getting big).
> 
>   Know the type, shape, and size of you will want to work on.  Some algorithms are
> fastest on matrices, some on datasets with many more  rows then columns and
> some on other types or shapes of data.  Run your function on a variety of smallish
> datasets to see how it behaves as the number of rows, columns, etc. grows.  It
> is a waste of time to wait days for a result when you could know that its time is growing
> as 0.0001 * nrow(X)^3.
> 
>   Then there are all the standard recommendations on speeding things up - don't
> do an expensive computation if you don't use its result, don't recompute things
> every time you pass through a loop, etc.

And when you have done all of the previous, consider byte compiling your functions with the compiler package, which is available in standard R. Often, but not always, this can lead to a speedup.

Berend