[R] AIC, AICc, and K
Renaud Lancelot
renaud.lancelot at cirad.fr
Mon Dec 6 13:37:55 CET 2004
Martin Maechler a écrit :
>>>>>>"Spencer" == Spencer Graves <spencer.graves at pdf.com>
>>>>>> on Sat, 04 Dec 2004 17:09:26 -0800 writes:
>
>
> Spencer> I don't know the "best" way, but the following looks like it will
> Spencer> work:
>
> Spencer> tstDF <- data.frame(x=1:3, y=c(1,1,2))
> Spencer> fit0 <- lm(y~1, tstDF)
> Spencer> fitDF <- lm(y~x, tstDF)
> Spencer> AIC(fitDF,fit0)
> Spencer> df AIC
> Spencer> fitDF 3 5.842516
> Spencer> fit0 2 8.001399
>
> Spencer> The function AIC with only 1 argument returns only
> Spencer> a single number. However, given nested models, it
> Spencer> returns a data.frame with colums df and AIC. At
> Spencer> least in this example (and I would think in all
> Spencer> other contexts as well), "df" is the K you want.
>
> yes, but Benjamin would hardly want to invent a second model
> just in order to call AIC() with both models and get the data
> frame...
>
> As Benjamin hoped, the "df" is part of the logLik(.) result,
> and if either of you had more carefully looked at help(logLik),
> you'd have seen
>
>
>>>Value:
>>>
>>> Returns an object, say 'r', of class 'logLik' which is a number
>>> with attributes, 'attr(r, "df")' (*d*egrees of *f*reedom) giving
>>> the number of parameters in the model. There's a simple 'print'
>>> method for 'logLik' objects.
>
>
> More directly, you can use
>
> > str(unclass(logLik(fit0)))
> atomic [1:1] -2
> - attr(*, "nall")= int 3
> - attr(*, "nobs")= int 3
> - attr(*, "df")= num 2
>
> or
>
> > stats:::AIC.logLik
> function (object, ..., k = 2)
> -2 * c(object) + k * attr(object, "df")
>
> to see how these work.
>
> Martin Maechler, ETH Zurich
[snip]
Yes, but see the problem recently outlined by Prof. Ripley about nobs:
> On Sun, 31 Oct 2004, Prof Brian Ripley wrote:
>
>
>>> The harder task is actually to get `n', not `npar'.
>>>
>>> length(resid(x)) may or may not include missing values, depending on the
>>> na.action used, and will include observations with weight zero.
>>> However, logLik's "lm" method returns an attribute "nobs" that is a better
>>> choice.
>
>
> But only better, as it looks like it has fallen into the first trap.
> AFAICS, if 'fit' is an lm fit, then
>
> n = df.residual(fit) + fit$rank
>
>
> Quick check:
>
> x <- rnorm(10); x[2] <- NA
> y <- rnorm(10); y[1] <- NA
> w <- c(rep(1,9), 0)
> fit <- lm(y ~x, weights = w, na.action=na.exclude)
> df.residual(fit) + fit$rank # 7, correct
> attributes(logLik(fit)) # nall 9 nobs 9 df 3
> # but AIC fails
> length(resid(fit)) # 10
>
> fit <- lm(y ~x, weights = w, na.action=na.omit)
> df.residual(fit) + fit$rank # 7, correct
> attributes(logLik(fit)) # nall 7 nobs 7 df 3
> length(resid(fit)) # 8
The problem might also become a bit more complicated when dealing with
other kind of models than lm (or weighted lm, etc).
For example, what is nobs when you are using the syntax:
cbind(success, failure) ~ covariates
in a binomial glm ? Is it the number of lines in the aggregated data
frame, or the total number of observations ?
Best,
Renaud
--
Dr Renaud Lancelot, vétérinaire
C/0 Ambassade de France - SCAC
BP 834 Antananarivo 101 - Madagascar
e-mail: renaud.lancelot at cirad.fr
tel.: +261 32 40 165 53 (cell)
+261 20 22 665 36 ext. 225 (work)
+261 20 22 494 37 (home)
More information about the R-help
mailing list