[R] AIC, AICc, and K

Mon Dec 6 13:37:55 CET 2004

Martin Maechler a Ã©crit :
>>>>>>"Spencer" == Spencer Graves <spencer.graves at pdf.com>
>>>>>>    on Sat, 04 Dec 2004 17:09:26 -0800 writes:
> 
> 
>     Spencer> I don't know the "best" way, but the following looks like it will 
>     Spencer> work: 
> 
>     Spencer> tstDF <- data.frame(x=1:3, y=c(1,1,2))
>     Spencer> fit0 <- lm(y~1, tstDF)
>     Spencer> fitDF <- lm(y~x, tstDF)
>     Spencer> AIC(fitDF,fit0)
>     Spencer> df      AIC
>     Spencer> fitDF  3 5.842516
>     Spencer> fit0   2 8.001399
> 
>     Spencer> The function AIC with only 1 argument returns only
>     Spencer> a single number.  However, given nested models, it
>     Spencer> returns a data.frame with colums df and AIC.  At
>     Spencer> least in this example (and I would think in all
>     Spencer> other contexts as well), "df" is the K you want.
>    
> yes, but Benjamin would hardly want to invent a second model
> just in order to call AIC() with both models and get the data
> frame...
> 
> As Benjamin hoped, the "df" is part of the logLik(.) result,
> and if either of you had more carefully looked at  help(logLik),
> you'd have seen
> 
> 
>>>Value:
>>>
>>>     Returns an object, say 'r', of class 'logLik' which is a number
>>>     with attributes, 'attr(r, "df")' (*d*egrees of *f*reedom) giving
>>>     the number of parameters in the model. There's a simple 'print'
>>>     method for 'logLik' objects.
> 
> 
> More directly, you can use
> 
>     > str(unclass(logLik(fit0)))
>      atomic [1:1] -2
>      - attr(*, "nall")= int 3
>      - attr(*, "nobs")= int 3
>      - attr(*, "df")= num 2
> 
> or
> 
>     > stats:::AIC.logLik
>     function (object, ..., k = 2) 
>     -2 * c(object) + k * attr(object, "df")
> 
> to see how these work.
> 
> Martin Maechler, ETH Zurich

[snip]

Yes, but see the problem recently outlined by Prof. Ripley about nobs:

> On Sun, 31 Oct 2004, Prof Brian Ripley wrote:
> 
> 
>>> The harder task is actually to get `n', not `npar'.
>>> 
>>> length(resid(x)) may or may not include missing values, depending on the 
>>> na.action used, and will include observations with weight zero.
>>> However, logLik's "lm" method returns an attribute "nobs" that is a better 
>>> choice.
> 
> 
> But only better, as it looks like it has fallen into the first trap.
> AFAICS, if 'fit' is an lm fit, then
> 
> n = df.residual(fit) + fit$rank
> 
> 
> Quick check:
> 
> x <- rnorm(10); x[2] <- NA
> y <- rnorm(10); y[1] <- NA
> w <- c(rep(1,9), 0)
> fit <- lm(y ~x, weights = w, na.action=na.exclude)
> df.residual(fit) + fit$rank  # 7, correct
> attributes(logLik(fit)) # nall 9 nobs 9 df 3
> # but AIC fails
> length(resid(fit)) # 10
> 
> fit <- lm(y ~x, weights = w, na.action=na.omit)
> df.residual(fit) + fit$rank  # 7, correct
> attributes(logLik(fit)) # nall 7 nobs 7 df 3
> length(resid(fit)) # 8

The problem might also become a bit more complicated when dealing with 
other kind of models than lm (or weighted lm, etc).

For example, what is nobs when you are using the syntax:

cbind(success, failure) ~ covariates

in a binomial glm ? Is it the number of lines in the aggregated data 
frame, or the total number of observations ?

Best,

Renaud

-- 
Dr Renaud Lancelot, vÃ©tÃ©rinaire
C/0 Ambassade de France - SCAC
BP 834 Antananarivo 101 - Madagascar

e-mail: renaud.lancelot at cirad.fr
tel.:   +261 32 40 165 53 (cell)
         +261 20 22 665 36 ext. 225 (work)
         +261 20 22 494 37 (home)