[R] AIC mathematical artefact or computation problem ?

Fri Mar 24 04:07:11 CET 2006

lionel humbert <humbert.lionel <at> courrier.uqam.ca> writes:

> 
> Dear R user,
> 
> I have made many logistic regression (glm function) with a second order 
> polynomial formula on a data set containing 440 observation of 96 
> variables. I’ve made the plot of AIC versus the frequency 
> (presence/observations) of each variable and I obtain a nearly perfect 
> arch effect with a symmetric axe for a frequency of 0.5 . I obtain the 
> same effect with deterministic data. Maybe I’ve miss something, but I 
> have found nothing that could explain this in the theoretical 
> calculation. Could it be due to the computation under R or AIC value is 
> a function of frequency ?
> 

f <- function(a,b,n=500) {
   x <- runif(n)
   y <- rbinom(n,size=1,prob=plogis(a+b*x))
   AIC(glm(y~x,family=binomial))
}

b <- 0.1
avec <- seq(-5,5,length=50)
nsim <- 100

resmat <- matrix(nrow=length(avec),ncol=nsim)
for (i in 1:length(a)) {
  resmat[i,] <- replicate(nsim,f(avec[i],b))
}

matplot(resmat)

## or even more simply:
x2 <- sapply(avec,
   function(a) {-sum(dbinom(rbinom(10000,size=1,prob=plogis(a)),
             size=1,prob=plogis(a),log=TRUE))})

  I don't think it's an artifact.  The curve basically
reflects some function of the variance of the binomial distribution --
the more variance, the lower the likelihood of any particular outcome,
the higher the log-likelihood and the AIC.  Doing a little math
would probably get you the exact form of the curve.