[R] very fast OLS regression?
Bernardo Rangel Tura
tura at centroin.com.br
Thu Mar 26 11:00:07 CET 2009
On Wed, 2009-03-25 at 22:11 +0100, Dimitris Rizopoulos wrote:
> check the following options:
>
> ols1 <- function (y, x) {
> coef(lm(y ~ x - 1))
> }
>
> ols2 <- function (y, x) {
> xy <- t(x)%*%y
> xxi <- solve(t(x)%*%x)
> b <- as.vector(xxi%*%xy)
> b
> }
>
> ols3 <- function (y, x) {
> XtX <- crossprod(x)
> Xty <- crossprod(x, y)
> solve(XtX, Xty)
> }
>
> ols4 <- function (y, x) {
> lm.fit(x, y)$coefficients
> }
>
> # check timings
> MC <- 500
> N <- 10000
>
> set.seed(0)
> x <- matrix(rnorm(N*MC), N, MC)
> y <- matrix(rnorm(N*MC), N, MC)
>
> invisible({gc(); gc(); gc()})
> system.time(for (mc in 1:MC) ols1(y[, mc], x[, mc]))
>
> invisible({gc(); gc(); gc()})
> system.time(for (mc in 1:MC) ols2(y[, mc], x[, mc]))
>
> invisible({gc(); gc(); gc()})
> system.time(for (mc in 1:MC) ols3(y[, mc], x[, mc]))
>
> invisible({gc(); gc(); gc()})
> system.time(for (mc in 1:MC) ols4(y[, mc], x[, mc, drop = FALSE]))
Hi Dimitris and Ivo
I think this is not a fair comparison, look this
x[8,100]<-NA
system.time(for (mc in 1:MC) ols1(y[, mc], x[, mc]))
user system elapsed
8.765 0.000 8.762
system.time(for (mc in 1:MC) ols2(y[, mc], x[, mc]))
Error in solve.default(t(x) %*% x) :
system is computationally singular: reciprocal condition number = 0
Timing stopped at: 0 0 0.002
system.time(for (mc in 1:MC) ols3(y[, mc], x[, mc]))
Error in solve.default(XtX, Xty) :
system is computationally singular: reciprocal condition number = 0
Timing stopped at: 0 0 0.001
system.time(for (mc in 1:MC) ols4(y[, mc], x[, mc, drop = FALSE]))
Error in lm.fit(x, y) : NA/NaN/Inf in foreign function call (arg 1)
Timing stopped at: 0 0 0.001
So routines ols2, ols3 and ols4 only functional in fully matrix if have
one NA this functions don't run.
--
Bernardo Rangel Tura, M.D,MPH,Ph.D
National Institute of Cardiology
Brazil
More information about the R-help
mailing list