[R] Change in 'solve' for r-patched

Thu Oct 30 20:01:13 CET 2003

The solve function in r-patched has been changed so that it applies a
tolerance when using Lapack routines to calculate the inverse of a
matrix or to solve a system of linear equations.  A tolerance has
always been used with the Linpack routines but not with the Lapack
routines in versions 1.7.x and 1.8.0.  (You can use the optional
argument tol = 0 to override this check for computational singularity
but you probably want to ask yourself why before doing so.  In a
perfect world you would be required to pass a qualifier exam before
being allowed to do that. :-)

We found that making such a change with a not-unreasonable default
value for the tolerance caused several packages to fail R CMD check.
As a result we have changed to a very generous default tolerance and,
by default, declare the matrix to be computationally singular if the
estimated reciprocal condition number is less than
.Machine$double.eps.  In future versions of R we may make that
tolerance tighter.

A quick glance at the R CMD check failures resulting from the earlier
change shows that in most cases the calculation involved was something
like 
  solve(t(X) %*% X)

This is not good numerical linear algebra.  Admittedly we all learn
that, for example, least squares estimates can be written as
(X'X)^{-1} X'y and it is seems reasonable to code this calculation as 
  betahat <- solve(t(X) %*% X) %*% t(X) %*% y
but you shouldn't do that.

In terms of numerical stability and also in terms of efficiency, that
is an remarkably bad way of determining least squares estimates.

To begin with, you don't invert a matrix just to solve a system of
equations.  If A is an n by n matrix and y is an n vector, then

 solve(A)

is the equivalent of performing

 solve(A, y)

n times.  Thus solve(A) %*% y is more than n times as expensive as
solve(A, y).  If you want A^{-1}y then write it as solve(A, y).

(In a perfect world you would be required to fill out a form, in
triplicate, explaining in detail exactly why there is no suitable
alternative to calculating the inverse before being allowed to use
solve(A).)

So, how should you calculate least squares estimates?  We recommend

 betahat <- qr.coef(qr(X), y)

If you want other results from the least squares calculation, such as
fitted values or residuals, you may want to save qr(X) so you can reuse it

 qrX <- qr(X)
 betahat <- qr.coef(qrX, y)
 res <- qr.resid(qrX, y)
 ...

There are alternatives but
 solve(t(X) %*% X) %*% t(X) %*% y
is never a good one.  Seber and Lee discuss discuss such calculations
at length in chapter 11 of their "Linear Regression Analysis (2nd ed)"
(Wiley, 2003).

Some other comments:

 - the condition number of X'X is the square of the condition number
   of X, which is why it is a good idea to avoid working with X'X
   whenever possible 
 - on those rare occasions when you do want (X'X)^{-1}, say to form
   a variance-covariance matrix, you should evaluate it as
    chol2inv(qr.R(qr(X)))
 - if, despite all of the above, you feel you must calculate X'X it is
   best done as
    crossprod(X)
   and not as
    t(X) %*% X
   Similarly, X'y is best calculated as crossprod(X, y).  In general
   you should use crossprod in place of "t(anything) %*% anythingelse"