[R] Model validation and penalization with rms package
Mark Seeto
markseeto at gmail.com
Tue Jun 29 03:31:04 CEST 2010
I’ve been using Frank Harrell’s rms package to do bootstrap model
validation. Is it the case that the optimum penalization may still
give a model which is substantially overfitted?
I calculated corrected R^2, optimism in R^2, and corrected slope for
various penalties for a simple example:
x1 <- rnorm(45)
x2 <- rnorm(45)
x3 <- rnorm(45)
y <- x1 + 2*x2 + rnorm(45,0,3)
ols0 <- ols(y ~ x1 + x2 + x3, x=TRUE, y=TRUE)
corrected.Rsq <- rep(0,60)
optimism.Rsq <- rep(0,60)
corrected.slope <- rep(0,60)
for (pen in 1:60) {
olspen <- ols(y ~ x1 + x2 + x3, penalty=pen, x=TRUE, y=TRUE)
val <- validate(olspen, B=200)
corrected.Rsq[pen] <- val["R-square", "index.corrected"]
optimism.Rsq[pen] <- val["R-square", "optimism"]
corrected.slope[pen] <- val["Slope", "index.corrected"]
}
plot(corrected.Rsq)
x11(); plot(optimism.Rsq)
x11(); plot(corrected.slope)
p <- pentrace(ols0, penalty=1:60)
ols9 <- ols(y ~ x1 + x2 + x3, penalty=9, x=TRUE, y=TRUE)
validate(ols9, B=200)
index.orig training test optimism index.corrected n
R-square 0.2523404 0.2820734 0.1911878 0.09088563 0.1614548 200
MSE 7.8497722 7.3525300 8.4918212 -1.13929116 8.9890634 200
Intercept 0.0000000 0.0000000 -0.1353572 0.13535717 -0.1353572 200
Slope 1.0000000 1.0000000 1.1707137 -0.17071372 1.1707137 200
pentrace tells me that of the penalties 1, 2,..., 60, corrected AIC is
maximised by a penalty of 9. This is consistent with the corrected R^2
plot, which shows a maximum somewhere around 10. However, a penalty of
9 still gives an R^2 optimism of 0.09 (training R^2=0.28, test
R^2=0.19), suggesting overfitting.
Do we just have to live with this R^2 optimism? It can be decreased by
taking a bigger penalty, but then the corrected R^2 is reduced. Also,
a penalty of 9 gives a corrected slope of about 1.17 (corrected slope
of 1 is achieved with a penalty of about 1 or 2).
Thanks for any help/advice you can give.
Mark
--
Mark Seeto
Statistician
National Acoustic Laboratories
A Division of Australian Hearing
More information about the R-help
mailing list