# [R] very fast OLS regression?

Bert Gunter gunter.berton at gene.com
Wed Mar 25 22:59:57 CET 2009

```lm is slow because it has to set up the design matrix (X) each time. See
?model.matrix and ?model.matrix.lm  for how to do this once separately from
lm (and then use lm.fit).

I am far from an expert on numerical algebra, but I'm pretty sure your
comments below are incorrect in the sense that "naive" methods are **not**
universally better then efficient numerical algebra methods using,say, QR
decompositions. It will depend very strongly on the size and specific nature
of the problem. Big Oh Complexity statements ( O(n) or O(n^2), etc.) are,
after all, asymptotic. There is also typically a tradeoff between
computational accuracy (especially for ill-conditioned matrices)  and speed
which your remarks seem to neglect.

-- Bert

Bert Gunter
Genentech Nonclinical Biostatistics
650-467-7374

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of ivo welch
Sent: Wednesday, March 25, 2009 2:30 PM
To: Dimitris Rizopoulos
Cc: r-help
Subject: Re: [R] very fast OLS regression?

thanks, dimitris.  I also added Bill Dunlap's "solve(qr(x),y)"
function as ols5.   here is what I get in terms of speed on a Mac Pro:

ols1 6.779 3.591 10.37 0 0
ols2 0.515 0.21 0.725 0 0
ols3 0.576 0.403 0.971 0 0
ols4 1.143 1.251 2.395 0 0
ols5 0.683 0.565 1.248 0 0

so the naive matrix operations are fastest.  I would have thought that
alternatives to the naive stuff I learned in my linear algebra course
would be quicker.   still, ols3 and ols5 are competitive.  the
built-in lm() is really problematic.  is ols3 (or perhaps even ols5)
preferable in terms of accuracy?  I think I can deal with 20% speed
slow-down (but not with a factor 10 speed slow-down).

regards,

/iaw

On Wed, Mar 25, 2009 at 5:11 PM, Dimitris Rizopoulos
<d.rizopoulos at erasmusmc.nl> wrote:
> check the following options:
>
> ols1 <- function (y, x) {
>    coef(lm(y ~ x - 1))
> }
>
> ols2 <- function (y, x) {
>    xy <- t(x)%*%y
>    xxi <- solve(t(x)%*%x)
>    b <- as.vector(xxi%*%xy)
>    b
> }
>
> ols3 <- function (y, x) {
>    XtX <- crossprod(x)
>    Xty <- crossprod(x, y)
>    solve(XtX, Xty)
> }
>
> ols4 <- function (y, x) {
>    lm.fit(x, y)\$coefficients
> }
>

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help