[R] AIC function and Step function
Steven McKinney
smckinney at bccrc.ca
Fri Nov 28 07:19:06 CET 2008
Hi Dana,
Of course the only true way to know what the
AIC calculations are is to read the source
code. From within R, what with namespaces,
that is becoming increasingly difficult.
The AIC() function is not too hard to find
from R.
> AIC
function (object, ..., k = 2)
UseMethod("AIC")
<environment: namespace:stats>
So now we know that AIC is in the stats package.
> objects("package:stats", all=TRUE)
[1] ".MFclass" ".checkMFClasses" ".getXlevels" "AIC"
[5] "ARMAacf" "ARMAtoMA" "Box.test" "C"
so all we are seeing is the AIC generic method shown above.
Very often an S3 method will have a 'default'
method, so let's see if we get lucky:
> stats:::AIC.default
function (object, ..., k = 2)
{
ll <- if ("stats4" %in% loadedNamespaces())
stats4:::logLik
else logLik
if (length(list(...))) {
object <- list(object, ...)
val <- lapply(object, ll)
val <- as.data.frame(t(sapply(val, function(el) c(attr(el,
"df"), AIC(el, k = k)))))
names(val) <- c("df", "AIC")
Call <- match.call()
Call$k <- NULL
row.names(val) <- as.character(Call[-1])
val
}
else AIC(ll(object), k = k)
}
<environment: namespace:stats>
so there is the basic AIC method.
There are some references to logLik, so
let's look for a 'logLik' method too.
> stats:::AIC.logLik
function (object, ..., k = 2)
-2 * c(object) + k * attr(object, "df")
<environment: namespace:stats>
If anyone knows how to reveal all these methods
without having to guess, I'd like to know of easier
ways to find these methods. objects() doesn't
reveal them, showMethods() doesn't appear to,
and in my attempts, this occurred:
> methods(class = "AIC")
Error : cannot allocate vector of size 1.9 Gb
R(352,0xa0350074) malloc: *** mmap(size=2023526400) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Error in as.environment(pos) :
no item called "get(".__S3MethodsTable__.", envir = asNamespace(i))" on the search list
R(352,0xa0350074) malloc: *** mmap(size=2023526400) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
>
So it's just not easy finding these functions at the
command prompt.
Thankfully R is open source, so I downloaded the latest
R-2.8.0.tar.gz to directory
/Volumes/KilroyHD/kilroy/Software/R/R-2.8.0/
and untarred it all. Navigate to
/Volumes/KilroyHD/kilroy/Software/R/R-2.8.0/R-2.8.0/src/library/stats/R
and you're looking at all the stats package R scripts. There's
155 files in there, one named AIC.R so that looks promising.
Look in AIC.R and you'll see the two functions shown above,
but not the extractAIC bunch. Where to look for the rest?
There are many ways to look for extractAIC - you
could grep the files from a unix shell, etc.
Here's a script I use to allow easier perusal of
R source code. This script will concatenate all
the R source files in a directory into one file,
with file name and path descriptors between
files in the output.
### 8< ------------------------ ###
inputDir <- "/Volumes/KilroyHD/kilroy/Software/R/R-2.8.0/R-2.8.0/src/library/stats/R"
outputDir <- paste(inputDir, "/temp", sep = "")
outputFile <- "AllRScriptFiles.R"
outputDirFile <- paste(outputDir, outputFile, sep = "/")
if ( !file.exists(outputDir) ) {
dir.create(outputDir)
}
cat(paste("### All files from dir", inputDir, "\n\n"), file = outputDirFile)
## Get list of all files in dir
fileNames <- list.files(path = inputDir, pattern = "\\.R$", full.names = TRUE)
ofcon <- file(outputDirFile, open = "at")
for (fi in seq(along = fileNames)) {
fni <- fileNames[fi]
fc <- readLines(fni, n = -1)
cat(paste("\n\n##### File", fni, " #####\n\n"), file = outputDirFile, append = TRUE)
writeLines(fc, con = ofcon)
flush(ofcon)
}
close(ofcon)
### 8< ------------------------ ###
After running this script, navigate to
/Volumes/KilroyHD/kilroy/Software/R/R-2.8.0/R-2.8.0/src/library/stats/R/temp
with your favourite editor, and edit the file AllRScriptFiles.R
(Of course you will substitute your own directory names in all this.)
Find extractAIC, and you'll see it came from file
add.R
You'll see extractAIC methods for classes coxph, survreg, glm, ...
and you can see all the calculations.
extractAIC <- function(fit, scale, k = 2, ...) UseMethod("extractAIC")
extractAIC.coxph <- function(fit, scale, k = 2, ...)
{
## seems that coxph sometimes gives one and sometimes gives two values
## for loglik
edf <- length(fit$coef)
loglik <- fit$loglik[length(fit$loglik)]
c(edf, -2 * loglik + k * edf)
}
extractAIC.survreg <- function(fit, scale, k = 2, ...)
{
edf <- sum(fit$df)
c(edf, -2 * fit$loglik[2] + k * edf)
}
extractAIC.glm <- function(fit, scale = 0, k = 2, ...)
{
n <- length(fit$residuals)
edf <- n - fit$df.residual
aic <- fit$aic
c(edf, aic + (k-2) * edf)
}
extractAIC.lm <- function(fit, scale = 0, k = 2, ...)
{
n <- length(fit$residuals)
edf <- n - fit$df.residual
RSS <- deviance.lm(fit)
dev <- if(scale > 0) RSS/scale - n else n * log(RSS/n)
c(edf, dev + k * edf)
}
extractAIC.aov <- extractAIC.lm
extractAIC.negbin <- function(fit, scale, k = 2, ...)
{
n <- length(fit$residuals)
edf <- n - fit$df.residual
c(edf, -fit$twologlik + k * edf)
}
HTH
Steven McKinney
Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre
email: smckinney +at+ bccrc +dot+ ca
tel: 604-675-8000 x7561
BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C.
V5Z 1L3
Canada
-----Original Message-----
From: r-help-bounces at r-project.org on behalf of Dana77
Sent: Thu 11/27/2008 6:11 PM
To: r-help at r-project.org
Subject: [R] AIC function and Step function
I would like to figure out the equations for calculating "AIC" in both
"step() function" and "AIC () function". They are different. Then I
just type "step" in the R console, and found the "AIC" used in "step()
function" is "extractAIC". I went to the R help, and found:
"The criterion used is
AIC = - 2*log L + k * edf,
where L is the likelihood and edf the equivalent degrees of freedom (i.e.,
the number of free parameters for usual parametric models) of fit.
For linear models with unknown scale (i.e., for lm and aov), -2log L is
computed from the deviance and uses a different additive constant to logLik
and hence AIC. If RSS denotes the (weighted) residual sum of squares then
extractAIC uses for - 2log L the formulae RSS/s - n (corresponding to
Mallows' Cp) in the case of known scale s and n log (RSS/n) for unknown
scale. AIC only handles unknown scale and uses the formula n log (RSS/n) - n
+ n log 2? - sum log w where w are the weights."
Now, my question is what code I should use to look at the exact calculation
process in the AIC()function and extractAIC() function in R? Thanks!
Dana
--
View this message in context: http://www.nabble.com/AIC-function-and-Step-function-tp20728043p20728043.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list