[R] very fast OLS regression?

Dimitris Rizopoulos d.rizopoulos at erasmusmc.nl
Wed Mar 25 22:11:11 CET 2009


check the following options:

ols1 <- function (y, x) {
     coef(lm(y ~ x - 1))
}

ols2 <- function (y, x) {
     xy <- t(x)%*%y
     xxi <- solve(t(x)%*%x)
     b <- as.vector(xxi%*%xy)
     b
}

ols3 <- function (y, x) {
     XtX <- crossprod(x)
     Xty <- crossprod(x, y)
     solve(XtX, Xty)
}

ols4 <- function (y, x) {
     lm.fit(x, y)$coefficients
}

# check timings
MC <- 500
N <- 10000

set.seed(0)
x <- matrix(rnorm(N*MC), N, MC)
y <- matrix(rnorm(N*MC), N, MC)

invisible({gc(); gc(); gc()})
system.time(for (mc in 1:MC) ols1(y[, mc], x[, mc]))

invisible({gc(); gc(); gc()})
system.time(for (mc in 1:MC) ols2(y[, mc], x[, mc]))

invisible({gc(); gc(); gc()})
system.time(for (mc in 1:MC) ols3(y[, mc], x[, mc]))

invisible({gc(); gc(); gc()})
system.time(for (mc in 1:MC) ols4(y[, mc], x[, mc, drop = FALSE]))


I hope it helps.

Best,
Dimitris


ivo welch wrote:
> Dear R experts:
> 
> I just tried some simple test that told me that hand computing the OLS
> coefficients is about 3-10 times as fast as using the built-in lm()
> function.   (code included below.)  Most of the time, I do not care,
> because I like the convenience, and I presume some of the time goes
> into saving a lot of stuff that I may or may not need.  But when I do
> want to learn the properties of an estimator whose input contains a
> regression, I do care about speed.
> 
> What is the recommended fastest way to get regression coefficients in
> R?  (Is Gentlemen's weighted-least-squares algorithm implemented in a
> low-level C form somewhere?  that one was always lightning fast for
> me.)
> 
> regards,
> 
> /ivo
> 
> 
> 
> bybuiltin = function( y, x )   coef(lm( y ~ x -1 ));
> 
> byhand = function( y, x ) {
>   xy<-t(x)%*%y;
>   xxi<- solve(t(x)%*%x)
>   b<-as.vector(xxi%*%xy)
>   ## I will need these later, too:
>   ## res<-y-as.vector(x%*%b)
>   ## soa[i]<-b[2]
>   ## sigmas[i]<-sd(res)
>   b;
> }
> 
> 
> MC=500;
> N=10000;
> 
> 
> set.seed(0);
> x= matrix( rnorm(N*MC), nrow=N, ncol=MC );
> y= matrix( rnorm(N*MC), nrow=N, ncol=MC );
> 
> ptm = proc.time()
> for (mc in 1:MC) byhand(y[,mc],x[,mc]);
> cat("By hand took ", proc.time()-ptm, "\n");
> 
> ptm = proc.time()
> for (mc in 1:MC) bybuiltin(y[,mc],x[,mc]);
> cat("By built-in took ", proc.time()-ptm, "\n");
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

-- 
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014




More information about the R-help mailing list