[R] glm.fit to use LAPACK instead of LINPACK

Fri Oct 23 04:45:56 CEST 2009

On Thu, 22 Oct 2009, Douglas Bates wrote:

> On Thu, Oct 22, 2009 at 10:26 AM, Ravi Varadhan <rvaradhan at jhmi.edu> wrote:
>> Ted,
>
>> LAPACK is newer and is supposed to contain better algorithms than LINPACK.  It is not true that LAPACK cannot handle rank-deficient problems.  It can.
>
> It's not just a question of handling rank-deficiency.  It's the
> particular form of pivoting that is used so that columns associated
> with the same term stay adjacent.
>
> The code that is actually used in glm.fit and lm.fit, called through
> the Fortran subroutine dqrls, is a modified version of the Linpack
> dqrdc subroutine.
>
>> However, I do not know if using LAPACK in glm.fit instead of LINPACK would be faster and/or more memory efficient.
>
> The big thing that could be gained is the use of level-3 BLAS.  The
> current code uses only level-1 BLAS.  Accelerated BLAS can take
> advantage of level 3 calls relative to level-1.
>

How would I change to level-3 ?  Would I need to rebuild R with some 
flags?  I imagine some comparative benchmarks.

> Even so, I doubt that the QR decomposition is the big time sink in
> glm.fit.  Why not profile a representative fit and check?  I did
> profile the glm.fit code a couple of years ago and discovered that a
> lot of time was being spent evaluating the various family functions
> like the inverse link and the variance function and that was because
> of calls to pmin and pmax.

What kind of profiling software should I use?  Is the the Rprof in R able 
to report which part of glm.fit is the bottleneck?

>
> Before trying to change very tricky Fortran code you owe it to
> yourself to check that the potential gains would be.

Thanks for the suggestions.

>
>> ----- Original Message -----
>> From: Ted <tchiang at sickkids.ca>
>> Date: Thursday, October 22, 2009 10:53 am
>> Subject: Re: [R] glm.fit to use LAPACK instead of LINPACK
>> To: "r-help at R-project.org" <r-help at r-project.org>
>>
>>
>>> Hi,
>>>
>>> I understand that the glm.fit calls LINPACK fortran routines instead of
>>> LAPACK because it can handle the 'rank deficiency problem'.  If my data
>>> matrix is not rank deficient, would a glm.fit function which runs on
>>> LAPACK be faster?  Would this be worthwhile to convert glm.fit to use
>>> LAPACK?  Has anyone done this already??  What is the best way to do this?
>>>
>>> I'm looking at very large datasets (thousands of glm calls), and would
>>> like to know if it's worth the effort for performance issues.
>>>
>>> Thanks,
>>>
>>> Ted
>>>
>>> -------------------------------------
>>> Ted Chiang
>>>   Bioinformatics Analyst
>>>   Centre for Computational Biology
>>>   Hospital for Sick Children, Toronto
>>>   416.813.7028
>>>   tchiang at sickkids.ca
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>>
>>> PLEASE do read the posting guide
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>