[R-sig-ME] Why does anova.merMod use refit?

Wed Oct 2 16:09:26 CEST 2013

Hi,

I stumbled upon something that surprised me in the anova method of lme4 (development version from github), namely that it always refits the model using refit.

This may lead to problems if the fitting algorithm doesn't converge with the default settings for refit (see below for an example) and diverges from the behavior of the anova method prior version 1.0 (which was consequently always very fast).

I would prefer if the refitting done by the anova method could be disabled or optional. Perhaps it would be the best if it gives a warning if models are not fit with REML = FALSE (I don't know what is the best way to find this out in the fitted object).

Usually one does not expect the anova method to be really time intensive, which with refit may well be the case for complex models fit without REML = FALSE.

Best,
Henrik

An example (data from http://stats.stackexchange.com/q/71172/442):

# preparation
require(lme4)
options(contrasts=c('contr.sum', 'contr.poly'))
dat <- read.table("http://pastebin.com/raw.php?i=MmNQigRv", colClasses = c(NA, rep("factor", 2), rep("numeric", 2)))
dat$pos.centered <- scale(dat$position, scale = FALSE)
dat$pos.centered.squared <- scale(dat$position^2, scale = FALSE)

# without REML = FALSE
m1 <- lmer(diff ~ cond.lag * (pos.centered + pos.centered.squared) + (cond.lag * (pos.centered + pos.centered.squared)|sub), dat, control = lmerControl(optCtrl = list(maxfun = 100000)))
m2 <- lmer(diff ~ cond.lag + pos.centered + pos.centered.squared + (cond.lag * (pos.centered + pos.centered.squared)|sub), dat)

anova(m1, m2)
## Data: dat
## Models:
## m2: diff ~ cond.lag + pos.centered + pos.centered.squared + (cond.lag *
## m2:     (pos.centered + pos.centered.squared) | sub)
## m1: diff ~ cond.lag * (pos.centered + pos.centered.squared) + (cond.lag *
## m1:     (pos.centered + pos.centered.squared) | sub)
##    Df    AIC    BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## m2 26 -10924 -10763   5488   -10976
## m1 28 -10904 -10732   5480   -10960     0      2          1
## Warnmeldungen:
## 1: In optwrap(optimizer, devfun, x at theta, lower = x at lower) :
##   convergence code 1 from bobyqa: bobyqa -- maximum number of function evaluations exceeded
## 2: In optwrap(optimizer, devfun, x at theta, lower = x at lower) :
##   convergence code 1 from bobyqa: bobyqa -- maximum number of function evaluations exceeded

# with REML = FALSE

m1r <- lmer(diff ~ cond.lag * (pos.centered + pos.centered.squared) + (cond.lag * (pos.centered + pos.centered.squared)|sub), dat, control = lmerControl(optCtrl = list(maxfun = 100000)), REML = FALSE)
m2r <- lmer(diff ~ cond.lag + pos.centered + pos.centered.squared + (cond.lag * (pos.centered + pos.centered.squared)|sub), dat, REML = FALSE)

anova(m1r, m2r)

## Data: dat
## Models:
## m2r: diff ~ cond.lag + pos.centered + pos.centered.squared + (cond.lag *
## m2r:     (pos.centered + pos.centered.squared) | sub)
## m1r: diff ~ cond.lag * (pos.centered + pos.centered.squared) + (cond.lag *
## m1r:     (pos.centered + pos.centered.squared) | sub)
##     Df    AIC    BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## m2r 26 -10838 -10678   5445   -10890
## m1r 28 -10835 -10662   5445   -10891   0.3      2       0.86

-- 
Dipl. Psych. Henrik Singmann
PhD Student
Albert-Ludwigs-Universität Freiburg, Germany
http://www.psychologie.uni-freiburg.de/Members/singmann