[Rd] Speed up code, profiling, optimization, lapply vs. loops
Kasper Daniel Hansen
khansen at stat.berkeley.edu
Tue Jul 7 06:53:36 CEST 2009
Aside from the advice from other people, you seem to be doing many glm
calls. A big part of a call to a model function involves setting up
the design matrix, check for missing values etc. If I understand you
description correctly you may only need to do this once. This will
require some poking around in glm, but might save you a lot of time.
Kasper
On Jul 6, 2009, at 1:26 , Thorn Thaler wrote:
> High everybody,
>
> currently I'm writinig a package that, for a given family of
> variance functions depending on a parameter theta, say, computes the
> extended quasi likelihood (eql) function for different values of
> theta.
>
> The computation involves a couple of calls of the 'glm' routine.
> What I'm doing now is to call 'lapply' for a list of theta values
> and a function, that constructs a family object for the particular
> choice of theta, computes the glm and uses the results to get the
> eql. Not surprisingly the function is not very fast. Depending on
> the size of the parameter space under consideration it takes a
> couple of minutes until the function finishes. Testing ~1000
> Parameters takes about 5 minutes on my machine.
>
> I know that loops in R are slow more often than not. Thus, I thought
> using 'lapply' is a better way. But anyways, it is just another way
> of a loop. Besides, it involves some overhead for the function call
> and hence i'm not sure wheter using 'lapply' is really the better
> choice.
>
> What I like to know is to figure out, where the bottleneck lies.
> Vectorization would help, but since I don't think that there is
> vectorized 'glm' function, which is able to handle a vector of
> family objects. I'm not aware if there is any choice aside from
> using a loop.
>
> So my questions:
> - how can I figure out where the bottleneck lies?
> - is 'lapply' always superior to a loop in terms of execution time?
> - are there any 'evil' commands that should be avoided in a loop,
> for they slow down the computation?
> - are there any good books, tutorials about how to profile R code
> efficiently?
>
> TIA 4 ur help,
>
> Thorn
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list