[R] lmer p-vales are sometimes too small

Olof Leimar olof.leimar at zoologi.su.se
Fri Jan 6 16:27:31 CET 2006

This concerns whether p-values from lmer can be trusted. From 
simulations, it seems that lmer can produce very small, and probably 
spurious, p-values. I realize that lmer is not yet a finished product. 
Is it likely that the problem will be fixed in a future release of the 
lme4 package?

Using simulated data for a quite standard mixed-model anova (a balanced 
two-way design; see code for the function SimMixed pasted below), I 
compared the output of lmer, for three slightly different models, with 
the output of aov. For an example where there is no fixed treatment 
effect (null hypothesis is true), with 4 blocks, 2 treatments, and 40 
observations per treatment-block combination, I find that lmer gives 
more statistical significances than it should, whereas aov does not have 
this problem. An example of output I generated by calling
  > SimMixed(1000)
is the following:

Proportion significances at the 0.05 level
aov:     0.05
lmer.1:  0.148
lmer.2:  0.148
lmer.3:  0.151

Proportion significances at the 0.01 level
aov:     0.006
lmer.1:  0.076
lmer.2:  0.076
lmer.3:  0.077

Proportion significances at the 0.001 level
aov:     0.001
lmer.1:  0.047
lmer.2:  0.047
lmer.3:  0.047

which is based on 1000 simulations (and takes about 5 min on my PowerMac 
G5). The different models fitted are:

fm.aov <- aov(y ~ Treat + Error(Block/Treat), data = dat)
fm.lmer.1 <- lmer(y ~ Treat + (Treat|Block), data = dat)
fm.lmer.2 <- lmer(y ~ Treat + (Treat-1|Block), data = dat)
fm.lmer.3 <- lmer(y ~ Treat + (1|Block) + (Treat-1|Block), data = dat)

It seems that, depending on the level of the test, lmer gives between a 
factor of 3 to a factor of around 50 times too many significances. The 
first two lmer models seem to give identical results, whereas the third 
(which I think perhaps is the one that best represents the data 
generated by the simulation) differs slightly. In running the 
simulations, warnings like this are occasionally generated:

Warning message:
optim or nlminb returned message false convergence (8)
  in: "LMEoptimize<-"(`*tmp*`, value = list(maxIter = 200, tolerance = 

They seem to derive from the third of the lmer models. Perhaps there is 
some numerical issue in the lmer function? From running SimMixed() 
several times, I have noticed that large p-values (say, larger than 0.5) 
agree very well between lmer and aov, but there seems to be a systematic 
discrepancy for smaller p-values, where lmer gives smaller values than 
aov. The F-values agree between all analyzes (except for fm.lmer.3 when 
there is a warning), so there is a systematic difference between lmer 
and aov in how a p-value is obtained from the F-value, which becomes 
severe for small p-values.

My output from sessionInfo()

R version 2.2.1, 2005-12-20, powerpc-apple-darwin7.9.0

attached base packages:
[1] "methods"   "stats"     "graphics"  "grDevices" "utils" 
"datasets"  "base"

other attached packages:
      lme4   lattice    Matrix
  "0.98-1" "0.12-11"  "0.99-3"

Pasted code for the SimMixed function (some lines might wrap):

# This function generates n.sims random data sets for a design with 4
# blocks, 2 treatments applied to each block, and 40 replicate
# observations for each block-treatment combination. There is no true
# fixed treatment effect, so a statistical significance of a test for
# a fixed treatment effect ought to occur with a probability equal to
# the nominal level of the test. Four tests are applied to each
# simulated data set: the classical aov and three versions of lmer,
# corresponding to different model formulations. The proportion of
# tests for a fixed treatment effect that become significant at the
# 0.05 0.01 and 0.001 levels are printed, as well as the p-values for
# the last of the simulations. In my runs, lmer gives significance
# more often than indicated by the nominal level, for each of the
# three models, whereas aov is OK. The package lme4 needs to be loaded
# to run the code.

SimMixed <- function(n.sims = 1) {
   k <- 4                # number of blocks
   n <- 40               # num obs per block X treatment combination
   m1 <- 1.0             # fixed effect of level 1 of treatment
   m2 <- m1              # fixed effect of level 2 of treatment
   sd.block <- 0.5       # SD of block random effect
   sd.block.trt <- 1.0   # SD of random effect for block X treatm
   sd.res <- 0.1         # Residual SD
   Block <- factor( rep(1:k, each=2*n) )
   Treat <- factor( rep( rep(c("Tr1","Tr2"), k), each=n) )
   m <- rep( rep(c(m1, m2), k), each=n) # fixed effects
   # storage for p-values
   p.aov <- rep(0, n.sims)
   p.lmer.1 <- rep(0, n.sims)
   p.lmer.2 <- rep(0, n.sims)
   p.lmer.3 <- rep(0, n.sims)
   for (i in 1:n.sims) {
     # first get block and treatment random deviations
     b <- rep( rep(rnorm(k, 0, sd.block), each=2) +
              rnorm(2*k, 0, sd.block.trt), each=n )
     # then get response
     y <- m + b + rnorm(2*k*n, 0, sd.res)
     dat <- data.frame(Block, Treat, y)
     # perform the tests
     fm.aov <- aov(y ~ Treat+Error(Block/Treat), data = dat)
     fm.lmer.1 <- lmer(y ~ Treat+(Treat|Block), data = dat)
     fm.lmer.2 <- lmer(y ~ Treat+(Treat-1|Block), data = dat)
     fm.lmer.3 <- lmer(y ~ Treat+(1|Block)+(Treat-1|Block), data = dat)
     # store the p-values
     p.aov[i] <- summary(fm.aov)$"Error: Block:Treat"[[1]]$"Pr(>F)"[1]
     p.lmer.1[i] <- anova(fm.lmer.1)[6]
     p.lmer.2[i] <- anova(fm.lmer.2)[6]
     p.lmer.3[i] <- anova(fm.lmer.3)[6]

   cat("\nProportion significances at the 0.05 level \n")
   cat("aov:    ", sum(p.aov<0.05)/n.sims, "\n")
   cat("lmer.1: ", sum(p.lmer.1<0.05)/n.sims, "\n")
   cat("lmer.2: ", sum(p.lmer.2<0.05)/n.sims, "\n")
   cat("lmer.3: ", sum(p.lmer.3<0.05)/n.sims, "\n")

   cat("\nProportion significances at the 0.01 level \n")
   cat("aov:    ", sum(p.aov<0.01)/n.sims, "\n")
   cat("lmer.1: ", sum(p.lmer.1<0.01)/n.sims, "\n")
   cat("lmer.2: ", sum(p.lmer.2<0.01)/n.sims, "\n")
   cat("lmer.3: ", sum(p.lmer.3<0.01)/n.sims, "\n")

   cat("\nProportion significances at the 0.001 level \n")
   cat("aov:    ", sum(p.aov<0.001)/n.sims, "\n")
   cat("lmer.1: ", sum(p.lmer.1<0.001)/n.sims, "\n")
   cat("lmer.2: ", sum(p.lmer.2<0.001)/n.sims, "\n")
   cat("lmer.3: ", sum(p.lmer.3<0.001)/n.sims, "\n")

   cat("\nFinal aov analysis: \n")
   print(summary(fm.aov)$"Error: Block:Treat")
   cat("\nFinal lmer analysis 1: \n")
   cat("\nFinal lmer analysis 2: \n")
   cat("\nFinal lmer analysis 3: \n")

Olof Leimar, Professor
Department of Zoology
Stockholm University
SE-106 91 Stockholm

olof.leimar at zoologi.su.se

More information about the R-help mailing list