[Rd] Speed up code, profiling, optimization, lapply vs. loops

Roger Peng rdpeng at gmail.com
Tue Jul 7 13:54:38 CEST 2009


That's a good point---I've found that skipping a lot of the setup that
'glm' does and calling 'glm.fit' directly can save a lot of time.

-roger

On Tue, Jul 7, 2009 at 12:53 AM, Kasper Daniel
Hansen<khansen at stat.berkeley.edu> wrote:
> Aside from the advice from other people, you seem to be doing many glm
> calls. A big part of a call to a model function involves setting up the
> design matrix, check for missing values etc. If I understand you description
> correctly you may only need to do this once. This will require some poking
> around in glm, but might save you a lot of time.
>
> Kasper
>
> On Jul 6, 2009, at 1:26 , Thorn Thaler wrote:
>
>> High everybody,
>>
>> currently I'm writinig a package that, for a given family of variance
>> functions depending on a parameter theta, say, computes the extended quasi
>> likelihood (eql) function for different values of theta.
>>
>> The computation involves a couple of calls of the 'glm' routine. What I'm
>> doing now is to call 'lapply' for a list of theta values and a function,
>> that constructs a family object for the particular choice of theta, computes
>> the glm and uses the results to get the eql. Not surprisingly the function
>> is not very fast. Depending on the size of the parameter space under
>> consideration it takes a couple of minutes until the function finishes.
>> Testing ~1000 Parameters takes about 5 minutes on my machine.
>>
>> I know that loops in R are slow more often than not. Thus, I thought using
>> 'lapply' is a better way. But anyways, it is just another way of a loop.
>> Besides, it involves some overhead for the function call and hence i'm not
>> sure wheter using 'lapply' is really the better choice.
>>
>> What I like to know is to figure out, where the bottleneck lies.
>> Vectorization would help, but since I don't think that there is vectorized
>> 'glm' function, which is able to handle a vector of family objects. I'm not
>> aware if there is any choice aside from using a loop.
>>
>> So my questions:
>> - how can I figure out where the bottleneck lies?
>> - is 'lapply' always superior to a loop in terms of execution time?
>> - are there any 'evil' commands that should be avoided in a loop, for they
>> slow down the computation?
>> - are there any good books, tutorials about how to profile R code
>> efficiently?
>>
>> TIA 4 ur help,
>>
>> Thorn
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/



More information about the R-devel mailing list