[R] What BIC is calculated by 'regsubsets'?
Paul Murtaugh
murtaugh at stat.oregonstate.edu
Fri Dec 19 20:39:17 CET 2008
The function 'regsubsets' appears to calculate a BIC value that is
different from that returned by the function 'BIC'. The latter is
explained in the documentation, but I can't find an expression for the
statistic returned by 'regsubsets'.
Incidentally, both of these differ from the BIC that is given in Ramsey
and Schafer's, The Statistical Sleuth. I assume these are all linear
transformations of each other, but I'd like to know the 'regsubsets'
formula (so that I can develop a way to do all-subsets selection based
on the AIC rather than the BIC).
The following code defines a function that illustrates the issue.
Thanks
-Paul
script.ic <- function() {
library(datasets)
print(names(airquality)) # Ozone Solar.R Wind Temp Month Day
# Fit a model with two predictors
mod1 <- lm(Ozone ~ Wind + Temp, data=airquality)
npar <- length(mod1$coef)+1 # no. parameters in fitted model,
# including s2, is 4
nobs <- length(mod1$fitted) # no. of observations = 116
s2 <- summary(mod1)$sigma2 # MSE = 477.6371
logL <- as.vector(logLik(mod1)) # log likelihood = -520.8705
# Use the R function BIC, defined as: -2*log-likelihood + npar*log(nobs)
tmp1 <- BIC(mod1) # 1060.755
tmp2 <- -2 * (-520.8705) + 4 * log(116) # 1060.755, agrees
cat(paste("\nFrom R's BIC:",signif(tmp1,5),"(",signif(tmp2,5),
"obtained 'by hand')\n\n"))
# Now see how 'regsubsets' calculates the BIC
tmp3 <- regsubsets(Ozone ~ Solar.R + Wind + Temp, data=airquality)
tmp3.s <- summary(tmp3)
# 'mod1' is the second model in 'tmp3'; what is the formula for this BIC?
cat("\nThe corresponding model from 'regsubsets':\n")
print(tmp3.s$which[2,])
tmp4 <- tmp3.s$bic[2] # -82.52875
cat(paste("\nBIC =",signif(tmp4,5),"\n"))
# Incidentally, the 'rsq' and 'rss' components of tmp3.s do not agree
# with the values in the 'mod1' lm output object.
# Just for kicks, try the formula for Schwarz's BIC from Ramsey & Schafer,
# Statistical Sleuth: nobs*log(MSE) + npar*log(nobs))
tmp5 <- 116 * log(477.6371) + 4*log(116) # 734.6011
cat(paste("\nStat. Sleuth's BIC =",signif(tmp5,5),"\n"))
cat("\nUff da!\n")
browser()
}
More information about the R-help
mailing list