[R] How to write efficient R code

Wed Feb 18 00:52:19 CET 2004

Rajarshi Guha <rxg218 at psu.edu> writes:

> On Tue, 2004-02-17 at 12:21, Tom Blackwell wrote:
> > Lennart  -
> > 
> > My two rules are:
> > 
> >   1. Be straightforward.  Don't try to be too fancy.  Don't worry
> >   about execution time until you have the WHOLE thing programmed
> >   and DOING everything you want it to.  Then profile it, if it's
> >   really going to be run more than 1000 times.  Execution time
> >   is NOT the issue.  Code maintainability IS.
> > 
> >  2. Use vector operations wherever possible.  Avoid explicit loops.
> >     However, the admonition to avoid loops is probably much less
> >     important now than it was with the Splus of 10 or 15 years ago.
> > 
> > (Not that I succeed in obeying these rules myself, all the time.)
> > 
> > Remember:  execution time is not the issue.  memory size may be.
> > clear, maintainable code definitely is.
> 
> I've been using for maybe 6 months or less and am by no means an R
> expert. But the above two points are extremely valid - my policy is to
> always write code that I can read 2 months later without comments
> (though in the end I do add them) - even if it requires loops.
> 
> However, after I'm sure the results are right I spend time on trying to
> vectorise the code. And that has improved performace by orders of
> magnitude (IMO, its also more elegant to have a one line vector
> operation rather than a loop).

All true. A couple of additional remarks:

1) Some constructs are spectacularly inefficient, as you'll realize
   when you think about what they have to do. One standard example is

        for (i in 1:10000) 
            x[i] <- f(i)

   which becomes much faster if you preallocate x <- numeric(10000)
   (never mind that sapply will do it more neatly). Without
   preallocation, R will need to extend the array on every iteration,
   which require the whole array to be copied to a new location. It is
   a very good idea to keep your eyes open for these situations and
   try to avoid them.

2) On the other hand, don't be trapped by efficiency differences that
   might be "accidental" and go away in later releases. We've seen a
   couple of cases were the Wrong Way was actually faster than the
   Right Way (details elude me -- something with deparse/reparse vs.
   symbolic computations, I suspect), but you this easily leads to
   code that is hard to read, and may have subtle bugs.

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907