[R] glm.fit to use LAPACK instead of LINPACK
tchiang at sickkids.ca
Fri Oct 23 04:45:56 CEST 2009
On Thu, 22 Oct 2009, Douglas Bates wrote:
> On Thu, Oct 22, 2009 at 10:26 AM, Ravi Varadhan <rvaradhan at jhmi.edu> wrote:
>> LAPACK is newer and is supposed to contain better algorithms than LINPACK. It is not true that LAPACK cannot handle rank-deficient problems. It can.
> It's not just a question of handling rank-deficiency. It's the
> particular form of pivoting that is used so that columns associated
> with the same term stay adjacent.
> The code that is actually used in glm.fit and lm.fit, called through
> the Fortran subroutine dqrls, is a modified version of the Linpack
> dqrdc subroutine.
>> However, I do not know if using LAPACK in glm.fit instead of LINPACK would be faster and/or more memory efficient.
> The big thing that could be gained is the use of level-3 BLAS. The
> current code uses only level-1 BLAS. Accelerated BLAS can take
> advantage of level 3 calls relative to level-1.
How would I change to level-3 ? Would I need to rebuild R with some
flags? I imagine some comparative benchmarks.
> Even so, I doubt that the QR decomposition is the big time sink in
> glm.fit. Why not profile a representative fit and check? I did
> profile the glm.fit code a couple of years ago and discovered that a
> lot of time was being spent evaluating the various family functions
> like the inverse link and the variance function and that was because
> of calls to pmin and pmax.
What kind of profiling software should I use? Is the the Rprof in R able
to report which part of glm.fit is the bottleneck?
> Before trying to change very tricky Fortran code you owe it to
> yourself to check that the potential gains would be.
Thanks for the suggestions.
>> ----- Original Message -----
>> From: Ted <tchiang at sickkids.ca>
>> Date: Thursday, October 22, 2009 10:53 am
>> Subject: Re: [R] glm.fit to use LAPACK instead of LINPACK
>> To: "r-help at R-project.org" <r-help at r-project.org>
>>> I understand that the glm.fit calls LINPACK fortran routines instead of
>>> LAPACK because it can handle the 'rank deficiency problem'. If my data
>>> matrix is not rank deficient, would a glm.fit function which runs on
>>> LAPACK be faster? Would this be worthwhile to convert glm.fit to use
>>> LAPACK? Has anyone done this already?? What is the best way to do this?
>>> I'm looking at very large datasets (thousands of glm calls), and would
>>> like to know if it's worth the effort for performance issues.
>>> Ted Chiang
>>> Bioinformatics Analyst
>>> Centre for Computational Biology
>>> Hospital for Sick Children, Toronto
>>> tchiang at sickkids.ca
>>> R-help at r-project.org mailing list
>>> PLEASE do read the posting guide
>>> and provide commented, minimal, self-contained, reproducible code.
>> R-help at r-project.org mailing list
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help