[R] ridge regression - covariance matrices of ridge coefficients
Michael Friendly
friendly at yorku.ca
Sat Aug 6 20:12:43 CEST 2011
For an application of ridge regression, I need to get the covariance
matrices of the estimated regression
coefficients in addition to the coefficients for all values of the ridge
contstant, lambda.
I've studied the code in MASS:::lm.ridge, but don't see how to do this
because the code is vectorized using
one svd calculation. The relevant lines from lm.ridge, using X, Y are:
Xscale <- drop(rep(1/n, n) %*% X^2)^0.5
X <- X/rep(Xscale, rep(n, p))
Xs <- svd(X)
rhs <- t(Xs$u) %*% Y
d <- Xs$d
lscoef <- Xs$v %*% (rhs/d)
lsfit <- X %*% lscoef
resid <- Y - lsfit
...
k <- length(lambda)
dx <- length(d)
div <- d^2 + rep(lambda, rep(dx, k))
a <- drop(d * rhs)/div
dim(a) <- c(dx, k)
coef <- Xs$v %*% a
dimnames(coef) <- list(names(Xscale), format(lambda))
Below is a naive, iterative function, which seems correct to me (though
it omits the intercept)
but it does not give the same estimated coefficients
as lm.ridge. A test case is included at the end. Can someone help me
resolve the discrepancy?
ridge <- function(y, X, lambda=0){
#dimensions
n <- nrow(X)
p <- ncol(X)
#center X and y
Xm <- colMeans(X)
ym <- mean(y)
X <- X - rep(Xm, rep(n, p))
y <- y - ym
#scale X, as in MASS::lm.ridge
Xscale <- drop(rep(1/n, n) %*% X^2)^0.5
X <- as.matrix(X/rep(Xscale, rep(n, p)))
XPX <- crossprod(X)
XPy <- crossprod(X,y)
I <- diag(p)
lmfit <- lm.fit(X, y)
MSE <- sum(lmfit$residuals^2) / (n-p)
# prepare output
coef <- matrix(0, length(lambda), p)
cov <- as.list(rep(0, length(lambda)))
mse <- rep(0, length(lambda))
# loop over lambdas
for(i in seq(length(lambda))) {
lam <- lambda[i]
XPXr <- XPX + lam * I
XPXI <- solve(XPXr)
coef[i,] <- XPXI %*% XPy
cov[[i]] <- MSE * XPXI %*% XPX %*% XPXI
res <- y - X %*% coef[i,]
mse[i] <- sum(res^2) / (n-p)
dimnames(cov[[i]]) <- list(colnames(X), colnames(X))
}
dimnames(coef) <- list(format(lambda), colnames(X))
result <- list(lambda=lambda, coef=coef, cov=cov, mse=mse,
scales=Xscale)
class(result) <- "ridge"
result
}
# Test:
> lambda <- c(0, 0.005, 0.01, 0.02, 0.04, 0.08)
> lambdaf <- c("", ".005", ".01", ".02", ".04", ".08")
> lridge <- ridge(longley.y, longley.X, lambda=lambda)
> lridge$coef
GNP Unemployed Armed.Forces Population Year GNP.deflator
0.000 -3.4471925 -1.827886 -0.6962102 -0.34419721 8.431972 0.15737965
0.005 -1.0424783 -1.491395 -0.6234680 -0.93558040 6.566532 -0.04175039
0.010 -0.1797967 -1.361047 -0.5881396 -1.00316772 5.656287 -0.02612152
0.020 0.4994945 -1.245137 -0.5476331 -0.86755299 4.626116 0.09766305
0.040 0.9059471 -1.155229 -0.5039108 -0.52347060 3.576502 0.32123994
0.080 1.0907048 -1.086421 -0.4582525 -0.08596324 2.641649 0.57025165
> (lmridge <- lm.ridge(Employed ~ GNP + Unemployed + Armed.Forces +
Population + Year + GNP.deflator, lambda=lambda, data=longley))
GNP Unemployed Armed.Forces Population Year
0.000 -3482.259 -0.035819179 -0.02020230 -0.010332269 -0.05110411 1.8291515
0.005 -2690.238 -0.010832211 -0.01648330 -0.009252720 -0.13890874 1.4244808
0.010 -2307.348 -0.001868237 -0.01504267 -0.008728422 -0.14894365 1.2270208
0.020 -1877.437 0.005190160 -0.01376159 -0.008127275 -0.12880848 1.0035455
0.040 -1442.709 0.009413540 -0.01276791 -0.007478405 -0.07772142 0.7758522
0.080 -1057.555 0.011333324 -0.01200742 -0.006800801 -0.01276325 0.5730541
GNP.deflator
0.000 0.015061872
0.005 -0.003995682
0.010 -0.002499936
0.020 0.009346751
0.040 0.030743969
0.080 0.054575402
>
--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street Web: http://www.datavis.ca
Toronto, ONT M3J 1P3 CANADA
More information about the R-help
mailing list