[R] AIC mathematical artefact or computation problem ?
Ben Bolker
bolker at ufl.edu
Fri Mar 24 04:07:11 CET 2006
lionel humbert <humbert.lionel <at> courrier.uqam.ca> writes:
>
> Dear R user,
>
> I have made many logistic regression (glm function) with a second order
> polynomial formula on a data set containing 440 observation of 96
> variables. I’ve made the plot of AIC versus the frequency
> (presence/observations) of each variable and I obtain a nearly perfect
> arch effect with a symmetric axe for a frequency of 0.5 . I obtain the
> same effect with deterministic data. Maybe I’ve miss something, but I
> have found nothing that could explain this in the theoretical
> calculation. Could it be due to the computation under R or AIC value is
> a function of frequency ?
>
f <- function(a,b,n=500) {
x <- runif(n)
y <- rbinom(n,size=1,prob=plogis(a+b*x))
AIC(glm(y~x,family=binomial))
}
b <- 0.1
avec <- seq(-5,5,length=50)
nsim <- 100
resmat <- matrix(nrow=length(avec),ncol=nsim)
for (i in 1:length(a)) {
resmat[i,] <- replicate(nsim,f(avec[i],b))
}
matplot(resmat)
## or even more simply:
x2 <- sapply(avec,
function(a) {-sum(dbinom(rbinom(10000,size=1,prob=plogis(a)),
size=1,prob=plogis(a),log=TRUE))})
I don't think it's an artifact. The curve basically
reflects some function of the variance of the binomial distribution --
the more variance, the lower the likelihood of any particular outcome,
the higher the log-likelihood and the AIC. Doing a little math
would probably get you the exact form of the curve.
More information about the R-help
mailing list