[R] Change in 'solve' for r-patched
Douglas Bates
bates at stat.wisc.edu
Thu Oct 30 20:01:13 CET 2003
The solve function in r-patched has been changed so that it applies a
tolerance when using Lapack routines to calculate the inverse of a
matrix or to solve a system of linear equations. A tolerance has
always been used with the Linpack routines but not with the Lapack
routines in versions 1.7.x and 1.8.0. (You can use the optional
argument tol = 0 to override this check for computational singularity
but you probably want to ask yourself why before doing so. In a
perfect world you would be required to pass a qualifier exam before
being allowed to do that. :-)
We found that making such a change with a not-unreasonable default
value for the tolerance caused several packages to fail R CMD check.
As a result we have changed to a very generous default tolerance and,
by default, declare the matrix to be computationally singular if the
estimated reciprocal condition number is less than
.Machine$double.eps. In future versions of R we may make that
tolerance tighter.
A quick glance at the R CMD check failures resulting from the earlier
change shows that in most cases the calculation involved was something
like
solve(t(X) %*% X)
This is not good numerical linear algebra. Admittedly we all learn
that, for example, least squares estimates can be written as
(X'X)^{-1} X'y and it is seems reasonable to code this calculation as
betahat <- solve(t(X) %*% X) %*% t(X) %*% y
but you shouldn't do that.
In terms of numerical stability and also in terms of efficiency, that
is an remarkably bad way of determining least squares estimates.
To begin with, you don't invert a matrix just to solve a system of
equations. If A is an n by n matrix and y is an n vector, then
solve(A)
is the equivalent of performing
solve(A, y)
n times. Thus solve(A) %*% y is more than n times as expensive as
solve(A, y). If you want A^{-1}y then write it as solve(A, y).
(In a perfect world you would be required to fill out a form, in
triplicate, explaining in detail exactly why there is no suitable
alternative to calculating the inverse before being allowed to use
solve(A).)
So, how should you calculate least squares estimates? We recommend
betahat <- qr.coef(qr(X), y)
If you want other results from the least squares calculation, such as
fitted values or residuals, you may want to save qr(X) so you can reuse it
qrX <- qr(X)
betahat <- qr.coef(qrX, y)
res <- qr.resid(qrX, y)
...
There are alternatives but
solve(t(X) %*% X) %*% t(X) %*% y
is never a good one. Seber and Lee discuss discuss such calculations
at length in chapter 11 of their "Linear Regression Analysis (2nd ed)"
(Wiley, 2003).
Some other comments:
- the condition number of X'X is the square of the condition number
of X, which is why it is a good idea to avoid working with X'X
whenever possible
- on those rare occasions when you do want (X'X)^{-1}, say to form
a variance-covariance matrix, you should evaluate it as
chol2inv(qr.R(qr(X)))
- if, despite all of the above, you feel you must calculate X'X it is
best done as
crossprod(X)
and not as
t(X) %*% X
Similarly, X'y is best calculated as crossprod(X, y). In general
you should use crossprod in place of "t(anything) %*% anythingelse"
More information about the R-help
mailing list