[R-sig-ME] cAIC
Oliver Soong
osoong+r at gmail.com
Sat Mar 16 06:12:13 CET 2013
Hi,
I'm using a linear mixed effects model on estimates of plant cover in
different years in plots situated along transects within different
zones. The full set of random effects is (1 | year) + (1 |
zone/transect/plot), where each term is treated categorically. I'm
interested in determining whether the plot-level random effect is
worth including in the model, and of course, that's where the trouble
begins.
I'm thinking in terms of AIC. Of course, the problem with AIC is
determining the d.f. of random effects. As I've read a number of
times, the appropriate d.f. lies somewhere between 1 and the number of
random effect groups/clusters, depending on the random effect variance
lying somewhere in (0,Inf).
In my naive view of things, then, the true AIC lies somewhere between
AIC1 based on 1 d.f. per random effect and AICn based on the number of
random effect groups. Dangerously pursuing this naive train of
thought, given two nested models in which the additional random
effects is itself nested (e.g., zone/transect vs. zone/transect/plot),
there is some d.f. per random effect at which the AIC differ by 2.
I'm ignoring the case in which the additional random effect improves
model fit by so much that comparison is trivial (which is not my
case).
I looked into Vaida & Blanchard (2005) and Greven & Kneib (2010). I
won't pretend to understand everything they're doing, but I did dig up
the code used by G&K. I think I managed to adapt it to work with lme4
in addition to nlme (through RLRsim), and I deleted their rescaling
code (which was buggy) and simplified the calculation of the
conditional log-likelihood (which had broken). I've appended that
code, and would definitely appreciate if anybody who might understand
conditional AIC (cAIC) and/or corrected conditional AIC (ccAIC) could
take a look. My apologies for the line breaks that e-mail will
butcher.
At the presumption of trying to use this code, I noticed something
surprising. I'm comparing my zone/transect model against my
zone/transect/plot model. The difference in log-likelihood is ~0.2.
Even using AIC1, I would conclude that adding the plot-level random
effects does not significantly improve the model fit, although I can
only claim that the simpler model with only the transect-level random
effect is significantly better if the plot-level random effect
represents at least ~1.1 degree of freedom. cAIC favors the model
with plot-level random effects (the difference in cAIC is ~1.5), which
is the bias G&K address. However, their ccAIC also favors the model
with plot-level random effects, and the difference in ccAIC is even
larger (~4.8). Given the very modest improvement in log-likelihood
afforded by the plot-level random effects, I'm surprised by these
results, although given my general ignorance, I wouldn't be surprised
if I've grossly misunderstood something.
I have 1863 observations and 21 fixed effects (including intercept).
Except for the additional plot-level random effect, the two fitted
fitted models are essentially identical (differences in coefficients
for terms shared between the zone/transect and zone/transect/plot
models are less than 1% of the corresponding estimated variances).
I'm using R 2.15.2 and lme4 0.999999.0.
Oliver
PS: The code borks when any random effect variance is 0.
tr <- function (x, na.rm = FALSE) {
sum(diag(x), na.rm = na.rm)
}
cAIC <- function(m) {
require(RLRsim)
if(class(m) == "lme") {
design <- extract.lmeDesign(m)
method <- m$method
} else if(class(m) == "mer") {
require(lme4)
design <- extract.lmerDesign(m)
method <- if(isREML(m)) "REML" else "ML"
} else stop("m must be of class lme or mer")
X <- design$X
Z <- design$Z
y <- design$y
lambda <- design$lambda
sigmasq <- design$sigmasq
n <- nrow(X)
tausq <- sigmasq * lambda
lambda <- 1 / lambda
dvr <- diag(design$Vr)
dvrFactor <- factor(dvr, levels = unique(dvr))
repseq <- table(dvrFactor)
V <- diag(rep(sigmasq, n)) + Z %*% diag(rep(tausq, repseq)) %*% t(Z)
H <- cbind(X, Z)
H <- t(H) %*% H
H1 <- solve(H + diag(c(rep(0, ncol(X)), rep(lambda, repseq))))
V0 <- diag(rep(1, n)) + Z %*% diag(rep(tausq, repseq)) %*% t(Z)
V0inv <- diag(rep(1, n)) - Z %*% solve(t(Z) %*% Z +
diag(rep(lambda, repseq))) %*% t(Z)
P <- diag(rep(1, n)) - X %*% solve(t(X) %*% V0inv %*% X) %*% t(X) %*% V0inv
A <- t(P) %*% V0inv
B <- matrix(0, length(tausq), length(tausq))
C <- matrix(0, length(tausq), n)
tyAy <- as.numeric(t(y) %*% A %*% y)
for (j in 1:nlevels(dvrFactor)) {
Zj <- Z[, dvrFactor == levels(dvrFactor)[j]]
AZjtZjA <- A %*% Zj %*% t(Zj) %*% A
tyAZjtZjAy <- as.numeric(t(y) %*% AZjtZjA %*% y)
C[j, ] <- 2 * (tyAy * t(y) %*% AZjtZjA - tyAZjtZjAy * t(y) %*% A)
for (k in j:nlevels(dvrFactor)) {
Zk <- Z[, dvrFactor == levels(dvrFactor)[k]]
AZktZkA <- A %*% Zk %*% t(Zk) %*% A
tyAZktZkAy <- as.numeric(t(y) %*% AZktZkA %*% y)
if(method == "ML") {
B[j, k] <- B[k, j] <- -tyAy ^ 2 * tr((t(Zk) %*% V0inv
%*% Zj) %*% (t(Zj) %*% V0inv %*% Zk)) / n - tyAZjtZjAy * tyAZktZkAy +
2 * tyAZktZkAy * tyAy
} else {
B[j, k] <- B[k, j] <- -tyAy ^ 2 * tr((t(Zk) %*% A %*%
Zj) %*% (t(Zj) %*% A %*% Zk)) / (n - ncol(X)) - tyAZjtZjAy *
tyAZktZkAy + 2 * tyAZktZkAy * tyAy
}
}
}
Binv <- solve(B)
rho <- n - tr(A)
Phi0 <- rho + sum(sapply(levels(dvrFactor), function(j) {
Zj <- Z[, dvrFactor == j]
sum(Binv %*% C %*% A %*% Zj %*% t(Zj) %*% A %*% y)
}))
ll <- sum(dnorm(y - fitted(m), 0, sqrt(sigmasq), log = TRUE))
caic <- -2 * ll + 2 * (rho + 1)
ccaic <- -2 * ll + 2 * (Phi0 + 1)
list(cAIC = caic, ccAIC = ccaic, cLogLik = ll, rho = rho, Phi0 = Phi0)
}
More information about the R-sig-mixed-models
mailing list