# [R] AIC mathematical artefact or computation problem ?

Ben Bolker bolker at ufl.edu
Fri Mar 24 04:07:11 CET 2006

```lionel humbert <humbert.lionel <at> courrier.uqam.ca> writes:

>
> Dear R user,
>
> I have made many logistic regression (glm function) with a second order
> polynomial formula on a data set containing 440 observation of 96
> variables. I’ve made the plot of AIC versus the frequency
> (presence/observations) of each variable and I obtain a nearly perfect
> arch effect with a symmetric axe for a frequency of 0.5 . I obtain the
> same effect with deterministic data. Maybe I’ve miss something, but I
> have found nothing that could explain this in the theoretical
> calculation. Could it be due to the computation under R or AIC value is
> a function of frequency ?
>

f <- function(a,b,n=500) {
x <- runif(n)
y <- rbinom(n,size=1,prob=plogis(a+b*x))
AIC(glm(y~x,family=binomial))
}

b <- 0.1
avec <- seq(-5,5,length=50)
nsim <- 100

resmat <- matrix(nrow=length(avec),ncol=nsim)
for (i in 1:length(a)) {
resmat[i,] <- replicate(nsim,f(avec[i],b))
}

matplot(resmat)

## or even more simply:
x2 <- sapply(avec,
function(a) {-sum(dbinom(rbinom(10000,size=1,prob=plogis(a)),
size=1,prob=plogis(a),log=TRUE))})

I don't think it's an artifact.  The curve basically
reflects some function of the variance of the binomial distribution --
the more variance, the lower the likelihood of any particular outcome,
the higher the log-likelihood and the AIC.  Doing a little math
would probably get you the exact form of the curve.

```