[R] lmer p-vales are sometimes too small

Sun Jan 8 23:03:02 CET 2006

I have just uploaded version 0.99-6 of the Matrix package to the
incoming area at CRAN.  It should appear on the archives in the next
day or two.

In this version all degrees of freedom, test statistics and p-values
have been removed from the summary, show and anova methods.  I agree
with John Maindonald that it is better not to give any p-values than
to give misleading, sometimes seriously misleading, p-values.

All the code for lmer is, of course, open source.  At present it
resides in the Matrix package but it may return to the lme4 package
following the release of R-2.3.0.  The sources for the Matrix package
are available at

  https://svn.r-project.org/R-packages/trunk/Matrix

An experimental version of lmer based on a supernodal sparse Cholesky
factorization is at

  https://svn.r-project.org/R-packages/branches/Matrix-mer2

That is the current development branch.  Once I get issues with the
PQL and Laplace methods for GLMMs resolved this branch will be merged
back into the trunk.  If you are only interested in linear mixed
models it is better to work with this branch.

In both branches there is an R function called getFixDF that currently
is a stub returning incorrect answers (as indicated in the comments). 
At present I don't know to get correct answers for the range of models
that can be fit by lmer.  I welcome submissions of patches.

On 1/6/06, Olof Leimar <olof.leimar at zoologi.su.se> wrote:
> This concerns whether p-values from lmer can be trusted. From
> simulations, it seems that lmer can produce very small, and probably
> spurious, p-values. I realize that lmer is not yet a finished product.
> Is it likely that the problem will be fixed in a future release of the
> lme4 package?
>
> Using simulated data for a quite standard mixed-model anova (a balanced
> two-way design; see code for the function SimMixed pasted below), I
> compared the output of lmer, for three slightly different models, with
> the output of aov. For an example where there is no fixed treatment
> effect (null hypothesis is true), with 4 blocks, 2 treatments, and 40
> observations per treatment-block combination, I find that lmer gives
> more statistical significances than it should, whereas aov does not have
> this problem. An example of output I generated by calling
>   > SimMixed(1000)
> is the following:
>
> Proportion significances at the 0.05 level
> aov:     0.05
> lmer.1:  0.148
> lmer.2:  0.148
> lmer.3:  0.151
>
> Proportion significances at the 0.01 level
> aov:     0.006
> lmer.1:  0.076
> lmer.2:  0.076
> lmer.3:  0.077
>
> Proportion significances at the 0.001 level
> aov:     0.001
> lmer.1:  0.047
> lmer.2:  0.047
> lmer.3:  0.047
>
> which is based on 1000 simulations (and takes about 5 min on my PowerMac
> G5). The different models fitted are:
>
> fm.aov <- aov(y ~ Treat + Error(Block/Treat), data = dat)
> fm.lmer.1 <- lmer(y ~ Treat + (Treat|Block), data = dat)
> fm.lmer.2 <- lmer(y ~ Treat + (Treat-1|Block), data = dat)
> fm.lmer.3 <- lmer(y ~ Treat + (1|Block) + (Treat-1|Block), data = dat)
>
> It seems that, depending on the level of the test, lmer gives between a
> factor of 3 to a factor of around 50 times too many significances. The
> first two lmer models seem to give identical results, whereas the third
> (which I think perhaps is the one that best represents the data
> generated by the simulation) differs slightly. In running the
> simulations, warnings like this are occasionally generated:
>
> Warning message:
> optim or nlminb returned message false convergence (8)
>   in: "LMEoptimize<-"(`*tmp*`, value = list(maxIter = 200, tolerance =
> 1.49011611938477e-08,
>
> They seem to derive from the third of the lmer models. Perhaps there is
> some numerical issue in the lmer function? From running SimMixed()
> several times, I have noticed that large p-values (say, larger than 0.5)
> agree very well between lmer and aov, but there seems to be a systematic
> discrepancy for smaller p-values, where lmer gives smaller values than
> aov. The F-values agree between all analyzes (except for fm.lmer.3 when
> there is a warning), so there is a systematic difference between lmer
> and aov in how a p-value is obtained from the F-value, which becomes
> severe for small p-values.
>
>
>
> My output from sessionInfo()
>
> R version 2.2.1, 2005-12-20, powerpc-apple-darwin7.9.0
>
> attached base packages:
> [1] "methods"   "stats"     "graphics"  "grDevices" "utils"
> "datasets"  "base"
>
> other attached packages:
>       lme4   lattice    Matrix
>   "0.98-1" "0.12-11"  "0.99-3"
>
>
>
> Pasted code for the SimMixed function (some lines might wrap):
>
> # This function generates n.sims random data sets for a design with 4
> # blocks, 2 treatments applied to each block, and 40 replicate
> # observations for each block-treatment combination. There is no true
> # fixed treatment effect, so a statistical significance of a test for
> # a fixed treatment effect ought to occur with a probability equal to
> # the nominal level of the test. Four tests are applied to each
> # simulated data set: the classical aov and three versions of lmer,
> # corresponding to different model formulations. The proportion of
> # tests for a fixed treatment effect that become significant at the
> # 0.05 0.01 and 0.001 levels are printed, as well as the p-values for
> # the last of the simulations. In my runs, lmer gives significance
> # more often than indicated by the nominal level, for each of the
> # three models, whereas aov is OK. The package lme4 needs to be loaded
> # to run the code.
>
> SimMixed <- function(n.sims = 1) {
>    k <- 4                # number of blocks
>    n <- 40               # num obs per block X treatment combination
>    m1 <- 1.0             # fixed effect of level 1 of treatment
>    m2 <- m1              # fixed effect of level 2 of treatment
>    sd.block <- 0.5       # SD of block random effect
>    sd.block.trt <- 1.0   # SD of random effect for block X treatm
>    sd.res <- 0.1         # Residual SD
>    Block <- factor( rep(1:k, each=2*n) )
>    Treat <- factor( rep( rep(c("Tr1","Tr2"), k), each=n) )
>    m <- rep( rep(c(m1, m2), k), each=n) # fixed effects
>    # storage for p-values
>    p.aov <- rep(0, n.sims)
>    p.lmer.1 <- rep(0, n.sims)
>    p.lmer.2 <- rep(0, n.sims)
>    p.lmer.3 <- rep(0, n.sims)
>    for (i in 1:n.sims) {
>      # first get block and treatment random deviations
>      b <- rep( rep(rnorm(k, 0, sd.block), each=2) +
>               rnorm(2*k, 0, sd.block.trt), each=n )
>      # then get response
>      y <- m + b + rnorm(2*k*n, 0, sd.res)
>      dat <- data.frame(Block, Treat, y)
>      # perform the tests
>      fm.aov <- aov(y ~ Treat+Error(Block/Treat), data = dat)
>      fm.lmer.1 <- lmer(y ~ Treat+(Treat|Block), data = dat)
>      fm.lmer.2 <- lmer(y ~ Treat+(Treat-1|Block), data = dat)
>      fm.lmer.3 <- lmer(y ~ Treat+(1|Block)+(Treat-1|Block), data = dat)
>      # store the p-values
>      p.aov[i] <- summary(fm.aov)$"Error: Block:Treat"[[1]]$"Pr(>F)"[1]
>      p.lmer.1[i] <- anova(fm.lmer.1)[6]
>      p.lmer.2[i] <- anova(fm.lmer.2)[6]
>      p.lmer.3[i] <- anova(fm.lmer.3)[6]
>    }
>
>    cat("\nProportion significances at the 0.05 level \n")
>    cat("aov:    ", sum(p.aov<0.05)/n.sims, "\n")
>    cat("lmer.1: ", sum(p.lmer.1<0.05)/n.sims, "\n")
>    cat("lmer.2: ", sum(p.lmer.2<0.05)/n.sims, "\n")
>    cat("lmer.3: ", sum(p.lmer.3<0.05)/n.sims, "\n")
>
>    cat("\nProportion significances at the 0.01 level \n")
>    cat("aov:    ", sum(p.aov<0.01)/n.sims, "\n")
>    cat("lmer.1: ", sum(p.lmer.1<0.01)/n.sims, "\n")
>    cat("lmer.2: ", sum(p.lmer.2<0.01)/n.sims, "\n")
>    cat("lmer.3: ", sum(p.lmer.3<0.01)/n.sims, "\n")
>
>    cat("\nProportion significances at the 0.001 level \n")
>    cat("aov:    ", sum(p.aov<0.001)/n.sims, "\n")
>    cat("lmer.1: ", sum(p.lmer.1<0.001)/n.sims, "\n")
>    cat("lmer.2: ", sum(p.lmer.2<0.001)/n.sims, "\n")
>    cat("lmer.3: ", sum(p.lmer.3<0.001)/n.sims, "\n")
>
>    cat("\nFinal aov analysis: \n")
>    print(summary(fm.aov)$"Error: Block:Treat")
>    cat("\nFinal lmer analysis 1: \n")
>    print(anova(fm.lmer.1))
>    cat("\nFinal lmer analysis 2: \n")
>    print(anova(fm.lmer.2))
>    cat("\nFinal lmer analysis 3: \n")
>    print(anova(fm.lmer.3))
> }
>
>
>
>
> --
> Olof Leimar, Professor
> Department of Zoology
> Stockholm University
> SE-106 91 Stockholm
> Sweden
>
> olof.leimar at zoologi.su.se
> http://www.zoologi.su.se/research/leimar/
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>