[R] glm.fit to use LAPACK instead of LINPACK

Douglas Bates bates at stat.wisc.edu
Fri Oct 23 03:39:05 CEST 2009


On Thu, Oct 22, 2009 at 10:26 AM, Ravi Varadhan <rvaradhan at jhmi.edu> wrote:
> Ted,

> LAPACK is newer and is supposed to contain better algorithms than LINPACK.  It is not true that LAPACK cannot handle rank-deficient problems.  It can.

It's not just a question of handling rank-deficiency.  It's the
particular form of pivoting that is used so that columns associated
with the same term stay adjacent.

The code that is actually used in glm.fit and lm.fit, called through
the Fortran subroutine dqrls, is a modified version of the Linpack
dqrdc subroutine.

> However, I do not know if using LAPACK in glm.fit instead of LINPACK would be faster and/or more memory efficient.

The big thing that could be gained is the use of level-3 BLAS.  The
current code uses only level-1 BLAS.  Accelerated BLAS can take
advantage of level 3 calls relative to level-1.

Even so, I doubt that the QR decomposition is the big time sink in
glm.fit.  Why not profile a representative fit and check?  I did
profile the glm.fit code a couple of years ago and discovered that a
lot of time was being spent evaluating the various family functions
like the inverse link and the variance function and that was because
of calls to pmin and pmax.

Before trying to change very tricky Fortran code you owe it to
yourself to check that the potential gains would be.

> ----- Original Message -----
> From: Ted <tchiang at sickkids.ca>
> Date: Thursday, October 22, 2009 10:53 am
> Subject: Re: [R] glm.fit to use LAPACK instead of LINPACK
> To: "r-help at R-project.org" <r-help at r-project.org>
>
>
>> Hi,
>>
>> I understand that the glm.fit calls LINPACK fortran routines instead of
>> LAPACK because it can handle the 'rank deficiency problem'.  If my data
>> matrix is not rank deficient, would a glm.fit function which runs on
>> LAPACK be faster?  Would this be worthwhile to convert glm.fit to use
>> LAPACK?  Has anyone done this already??  What is the best way to do this?
>>
>> I'm looking at very large datasets (thousands of glm calls), and would
>> like to know if it's worth the effort for performance issues.
>>
>> Thanks,
>>
>> Ted
>>
>> -------------------------------------
>> Ted Chiang
>>   Bioinformatics Analyst
>>   Centre for Computational Biology
>>   Hospital for Sick Children, Toronto
>>   416.813.7028
>>   tchiang at sickkids.ca
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>>
>> PLEASE do read the posting guide
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list