[R] very fast OLS regression?
Thomas Lumley
tlumley at u.washington.edu
Thu Mar 26 08:22:07 CET 2009
On Wed, 25 Mar 2009, ivo welch wrote:
> thanks, dimitris. I also added Bill Dunlap's "solve(qr(x),y)"
> function as ols5. here is what I get in terms of speed on a Mac Pro:
>
> ols1 6.779 3.591 10.37 0 0
> ols2 0.515 0.21 0.725 0 0
> ols3 0.576 0.403 0.971 0 0
> ols4 1.143 1.251 2.395 0 0
> ols5 0.683 0.565 1.248 0 0
>
> so the naive matrix operations are fastest. I would have thought that
> alternatives to the naive stuff I learned in my linear algebra course
> would be quicker. still, ols3 and ols5 are competitive. the
> built-in lm() is really problematic. is ols3 (or perhaps even ols5)
> preferable in terms of accuracy? I think I can deal with 20% speed
> slow-down (but not with a factor 10 speed slow-down).
ols5 is more accurate in terms of round-off error, and so it is how the internal computations are done in lm and lsfit, but it is relatively unusual for this to matter given double precision.
If you want to repeat the regression with the same x and many y you can do much better by taking the QR decomposition just once and applying to a matrix of all the ys.
It's also possible that using chol() and backsolve() rather than solve() would speed up ols2 and ols3, at least for large enough matrices.
-thomas
> regards,
>
> /iaw
>
>
> On Wed, Mar 25, 2009 at 5:11 PM, Dimitris Rizopoulos
> <d.rizopoulos at erasmusmc.nl> wrote:
>> check the following options:
>>
>> ols1 <- function (y, x) {
>> coef(lm(y ~ x - 1))
>> }
>>
>> ols2 <- function (y, x) {
>> xy <- t(x)%*%y
>> xxi <- solve(t(x)%*%x)
>> b <- as.vector(xxi%*%xy)
>> b
>> }
>>
>> ols3 <- function (y, x) {
>> XtX <- crossprod(x)
>> Xty <- crossprod(x, y)
>> solve(XtX, Xty)
>> }
>>
>> ols4 <- function (y, x) {
>> lm.fit(x, y)$coefficients
>> }
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
More information about the R-help
mailing list