[R] Change in 'solve' for r-patched

Sat Nov 1 19:08:59 CET 2003

Thomas Lumley <tlumley at u.washington.edu> writes:

> On Fri, 31 Oct 2003, Williams, Elliot - BLS wrote:
> >
> > None of these address the problem of numerical accuracy, which was the
> > thrust of Doug's comments.  Has anyone done a quick simulation to
> > investigate the stability of the solutions?  Is it small coefficients or
> > near-collinearity (or both) that one has to worry about?
> >
> > Are the solve(crossprod()) methods obviously unstable?  They surely do work
> > quickly!
> >
> 
> They are usually not *very* unstable.  That is, solve(crossprod())
> potentially loses about twice as many digits to rounding error as Doug's
> proposals. In many cases this is not that serious -- if you have 15 digits
> accuracy then losing 2 instead of 1 (or even 8 instead of 4) of them may
> not be a big deal.  Back in the days of single precision this was more
> obviously important and you would get visible differences between
> statistical packages based on their linear algebra routines. (There was a
> notorious econometric data set that caused problems in single precision
> with the solve(crossprod()) approach but was fine with QR decomposition).
> 
> On the other hand, even with double precision in extreme situations the
> harm can be noticeable, and the cost of working with QR or Cholesky
> decompositions instead is fairly small (and often negative -- with larger
> matrices you would probably see the cholesky-based method being faster)
> 
> In linear regression the problem happens with collinearity, or when one
> variable has much smaller variance than another.  In both of these cases
> the eigenvalues of the information matrix are of very different sizes. The
> ratio of the smallest to the largest eigenvalue is called the condition
> number, and the tolerance in solve() is based on condition numbers.
> 
> So how bad is the error? For any matrix M there exists some vector b and
> error e in b so that the relative error in solve(M,b+e) compared to
> solve(M,b) is the condition number of the matrix M times the relative
> error in b.  This means that condition numbers greater than 10^7 (and
> finding these in package examples is what prompted Doug's message) can
> amplify numerical error ten million times.  This is starting to get
> undesirable even when you have 15 digits to work with.
> 
> Working with very ill-conditioned matrices is possible, but it requires
> care in tracking and bounding errors, and if I wanted to do that I'd be in
> applied math, not biostatistics.  That's why a simple fix that reduces
> condition numbers from, say, 10^10 to 10^5 really is worthwhile,
> especially for software that will be used by other people who don't
> understand its internals.

As Thomas indicates, the numerical distinction between using X'X and
using X directly for least squares calculations are important in some
cases but not very important in most cases.

Regarding timings, I will agree that if mm is a model matrix and y is
the response then

solve(crossprod(mm), crossprod(mm,y))

is very fast, especially if R is using ATLAS (or, in my case, Goto's
BLAS).  The crossprod and the solve calculations are very highly
optimized.

However, in current versions of R

chol2inv(crossprod(mm)) %*% crossprod(mm,y)

is competitive in terms of timing, should be slightly more stable (it
uses the Cholesky decomposition of the positive-definite matrix
crossprod(mm) rather than an LU decomposition) and also produces the
(X'X)^{-1} matrix.

As an example, I use the model matrix and response from an example in
Koenker and Ng's sparseMatrix package.  The computer is a 2.0 GHz
Pentium 4 (512 KB L2 cache) running Debian GNU/Linux.  R is using
Goto's BLAS.  The mmd matrix is a dense version of the model matrix.

By far the fastest timing uses sparse matrix techniques on the sparse
model matrix mm but that is a different issue.

> str(mm)
 list()
 - attr(*, "p")= int [1:713] 0 13 17 26 38 43 52 56 61 67 ...
 - attr(*, "i")= int [1:8758] 0 2 25 27 163 165 1258 1261 1276 1278 ...
 - attr(*, "x")= num [1:8758] 0.277 0.277 0.277 0.277 0.277 ...
 - attr(*, "nrow")= int 1850
 - attr(*, "class")= atomic [1:1] cscMatrix
  ..- attr(*, "package")= chr "taucs"
> str(y)
 num [1:1850] 64.07  5.88 64.03  5.96 76.41 ...
> mmd = as(mm, "matrix")
> dim(mmd)
[1] 1850  712
> str(mmd)
 num [1:1850, 1:712] 0.277 0.000 0.277 0.000 0.000 ...
> system.time(bsparse <- solve(crossprod(mm), crossprod(mm,y)))
[1] 0.05 0.00 0.05 0.00 0.00
> system.time(bsparse <- solve(crossprod(mm), crossprod(mm,y)))
[1] 0.05 0.00 0.07 0.00 0.00
> str(bsparse)
 num [1:712, 1] 823 340 473 349 188 ...

It turns out that qr(mmd) is considerably slower than crossprod(mmd).
I had thought that this might be because the default for qr is to use
Linpack code, which only uses level-1 BLAS. However, even with
LAPACK=TRUE it is slow.  Presently the LAPACK=TRUE version in R does
full column pivoting and the LAPACK=FALSE version only pivots on
apparent singularity.  It may be that gains from using level-3 BLAS in
the LAPACK version are offset by the expense of doing the full pivoting.

(I will be experimenting with a LAPACK-based version that does
pivoting on apparent singularity but it will take some time to work out
all the details.)

> system.time(crossprod(mmd))
[1] 0.38 0.01 0.40 0.00 0.00
> system.time(crossprod(mmd))
[1] 0.37 0.01 0.38 0.00 0.00
> system.time(qr(mmd))
[1] 3.66 0.06 3.73 0.00 0.00
> system.time(qr(mmd))
[1] 3.57 0.07 3.64 0.00 0.00
> system.time(qr(mmd, LAPACK = TRUE))
[1] 3.50 0.05 3.57 0.00 0.00
> system.time(qr(mmd, LAPACK = TRUE))
[1] 3.51 0.04 3.55 0.00 0.00

A solution of the least squares system using the Cholesky and chol2inv
is relatively fast.  In fact, on this machine, it is marginally faster
than using the LU-based solve and it produces (X'X)^{-1}, in case you
need that.

> system.time(bc2i <- chol2inv(chol(crossprod(mmd))) %*% crossprod(mmd, y))
[1] 0.60 0.02 0.63 0.00 0.00
> system.time(bc2i <- chol2inv(chol(crossprod(mmd))) %*% crossprod(mmd, y))
[1] 0.60 0.03 0.63 0.00 0.00
> all.equal(bsparse, bc2i)
[1] TRUE
> system.time(bLU <- solve(crossprod(mmd), crossprod(mmd, y)))
[1] 0.65 0.06 0.71 0.00 0.00
> system.time(bLU <- solve(crossprod(mmd), crossprod(mmd, y)))
[1] 0.64 0.06 0.71 0.00 0.00
> str(bLU)
 num [1:712, 1] 823 340 473 349 188 ...
> all.equal(bLU, bsparse)
[1] TRUE

Methods using the QR decompositions are much slower but should be more
accurate.

> system.time(bQR <- qr.coef(qr(mmd), y))
[1] 3.95 0.10 4.06 0.00 0.00
> system.time(bQR <- qr.coef(qr(mmd), y))
[1] 3.70 0.11 3.81 0.00 0.00
> str(bQR)
 num [1:712] 823 340 473 349 188 ...
> all.equal(bQR, bsparse[,1])
[1] TRUE

*Warning* Don't try the next timing in R-1.8.0 or earlier.  Wait for
R-1.8.1. There is a bug in the code called by qr.coef when given an
Lapack-based qr object.  I have committed a fix to r-devel and will
also fix r-patched (a.k.a. R-1.8.1 alpha).

> system.time(bLA <- qr.coef(qr(mmd, LAPACK = TRUE), y))
[1] 3.57 0.03 3.61 0.00 0.00
> system.time(bLA <- qr.coef(qr(mmd, LAPACK = TRUE), y))
[1] 3.57 0.04 3.61 0.00 0.00
> all.equal(bLA, bsparse)
[1] "target is numeric, current is matrix"
> all.equal(bLA, bsparse[, 1])
[1] TRUE