# Even more timings [was Re: [R] Change in 'solve' for r-patched]

Douglas Bates bates at stat.wisc.edu
Sun Nov 2 19:00:35 CET 2003

```Earlier I posted some timings on different ways to solve a least
squares problem in R.  To get a comparison of the raw speed of the
Cholesky and the QR methods I wrote a couple of C functions (shown
below) that call Lapack and BLAS routines directly.

The results from these are
> system.time(blsq <- .Call("lsq_dense_Chol", mmd, y))
[1] 0.46 0.00 0.46 0.00 0.00
> system.time(blsq <- .Call("lsq_dense_Chol", mmd, y))
[1] 0.44 0.01 0.45 0.00 0.00
> all.equal(blsq, bsparse)
[1] TRUE
> system.time(blsQR <- .Call("lsq_dense_QR", mmd, y)[1:ncol(mmd),,drop=FALSE])
[1] 0.94 0.01 0.96 0.00 0.00
> all.equal(blsQR, bsparse)
[1] TRUE
> system.time(blsQR <- .Call("lsq_dense_QR", mmd, y)[1:ncol(mmd),,drop=FALSE])
[1] 0.94 0.03 0.97 0.00 0.00

For the dense matrix methods the timings, in increasing order, are

> system.time(blsq <- .Call("lsq_dense_Chol", mmd, y))
[1] 0.44 0.01 0.45 0.00 0.00
> system.time(bc2i <- chol2inv(chol(crossprod(mmd))) %*% crossprod(mmd, y))
[1] 0.60 0.02 0.63 0.00 0.00
> system.time(bLU <- solve(crossprod(mmd), crossprod(mmd, y)))
[1] 0.64 0.06 0.71 0.00 0.00
> system.time(blsQR <- .Call("lsq_dense_QR", mmd, y)[1:ncol(mmd),,drop=FALSE])
[1] 0.94 0.01 0.96 0.00 0.00
> system.time(bLA <- qr.coef(qr(mmd, LAPACK = TRUE), y))
[1] 3.57 0.04 3.61 0.00 0.00
> system.time(bQR <- qr.coef(qr(mmd), y))
[1] 3.70 0.11 3.81 0.00 0.00

Although only the last two can handle a rank deficient model matrix I
think there is a fairly easy way of modifying the code shown below to
detect apparent singularity and accomodate for it.

I think that the reason that the Linpack-based QR method is slow is
because it is based on Linpack, not Lapack, and the reason the qr(mmd,
LAPACK=TRUE) is slow is because it does full column pivoting.

It appears that QR is a little more than twice as slow as the Cholesky
and about 1.5 times as slow as the naive calculation using an LU
decomposition.

The team working on FLAME (http://www.cs.utexas.edu/users/flame) has
test versions of blocked algorithms for the QR and the Cholesky
decompositions that are faster than the Lapack code.  I expect to be
testing those in the next few weeks.

Here is the C code called above.

#include "Rdefines.h"
#include "R_ext/Lapack.h"

SEXP lsq_dense_Chol(SEXP X, SEXP y)
{
SEXP ans;
int info, n, p, k, *Xdims, *ydims;
double *xpx, d_one = 1., d_zero = 0.;

if (!(isReal(X) & isMatrix(X)))
error("X must be a numeric (double precision) matrix");
Xdims = INTEGER(coerceVector(getAttrib(X, R_DimSymbol), INTSXP));
n = Xdims[0];
p = Xdims[1];
if (!(isReal(y) & isMatrix(y)))
error("y must be a numeric (double precision) matrix");
ydims = INTEGER(coerceVector(getAttrib(y, R_DimSymbol), INTSXP));
if (ydims[0] != n)
error(
"number of rows in y (%d) does not match number of rows in X (%d)",
ydims[0], n);
k = ydims[1];
if (k < 1 || p < 1) return allocMatrix(REALSXP, p, k);
ans = PROTECT(allocMatrix(REALSXP, p, k));
F77_CALL(dgemm)("T", "N", &p, &k, &n, &d_one, REAL(X), &n, REAL(y), &n,
&d_zero, REAL(ans), &p);
xpx = Calloc(p * p, double);
F77_CALL(dsyrk)("U", "T", &p, &n, &d_one, REAL(X), &n, &d_zero,
xpx, &p);
F77_CALL(dposv)("U", &p, &k, xpx, &p, REAL(ans), &p, &info);
if (info) error("Lapack routine dposv returned error code %d", info);
Free(xpx);
UNPROTECT(1);
return ans;
}

SEXP lsq_dense_QR(SEXP X, SEXP y)
{
SEXP ans;
int info, n, p, k, *Xdims, *ydims, lwork;
double *work, d_one = 1., d_zero = 0., tmp, *xvals;

if (!(isReal(X) & isMatrix(X)))
error("X must be a numeric (double precision) matrix");
Xdims = INTEGER(coerceVector(getAttrib(X, R_DimSymbol), INTSXP));
n = Xdims[0];
p = Xdims[1];
if (!(isReal(y) & isMatrix(y)))
error("y must be a numeric (double precision) matrix");
ydims = INTEGER(coerceVector(getAttrib(y, R_DimSymbol), INTSXP));
if (ydims[0] != n)
error(
"number of rows in y (%d) does not match number of rows in X (%d)",
ydims[0], n);
k = ydims[1];
if (k < 1 || p < 1) return allocMatrix(REALSXP, p, k);
xvals = Calloc(n * p, double);
Memcpy(xvals, REAL(X), n * p);
ans = PROTECT(duplicate(y));
lwork = -1;
F77_CALL(dgels)("N", &n, &p, &k, xvals, &n, REAL(ans), &n,
&tmp, &lwork, &info);
if (info)
error("First call to Lapack routine dgels returned error code %d",
info);
lwork = (int) tmp;
work = Calloc(lwork, double);
F77_CALL(dgels)("N", &n, &p, &k, xvals, &n, REAL(ans), &n,
work, &lwork, &info);
if (info)
error("Second call to Lapack routine dgels returned error code %d",
info);
Free(work); Free(xvals);
UNPROTECT(1);
return ans;
}

```