[R-sig-ME] Why do extremely similar parametric bootstraps on lmer() objects yield (not only marginally) different results?

Sat May 3 13:59:48 CEST 2014

Hello all,

warning ahead: This will be quite a long post. But have faith, it is necessary.

I run parametric bootstraps in order to compare two models. I bootstrap the
likelihood ratio between the models. For anybody acquainted with this, the code
should be self explanatory. Model selection is more or less done and these are
exercises I like to perform with real data sets to get a feel for how different
softwares treat the same models. Unfortunately, I am not in a position to
provide the real data. A few words about the models though:

Both models are fit on a balanced data set which includes around 483 missing
values. There are slightly above 11196 observations (within: 70 subjects).
Every subject saw every item only one time. The dependent variable are reading
times; sentence_type and matching are crossed two-level factors. Context and
subject are treated as random effects. Context has 40 levels and (well you
guessed) subjects has 70 levels. (See this related post on:
http://stats.stackexchange.com/questions/94302/lmer-parametric-bootstrap-testing-for-fixed-effects):

Model formulae:
===============

# I only show the model formulae here because this piece stays fixed with
# every call to lmer(). The options change however, depending on the parametric  
# bootstrap procedure used. Any change can be seen in the code:
mod_min := value ~ passung + (1 + satz_typ | vp) + (1 + satz_typ | kontext)
mod8    := value ~ passung + (1 + satz_typ | vp) + (1 + satz_typ | kontext)

I set up different ways to perform parametric bootstraps. I will describe the
different ways and explain what I changed every time and show the results. The
ultimate question will be: Why the different results? Especially in one of the
ways to perform a parametric bootstrap.

Method 1 (pbkrtest).
====================

1.1. lmer() model calls:
------------------------

mod_min <- lmer(formula = log(value)
                ~ 1
                + (1 + satz_typ | vp)
                + (1 + satz_typ | kontext),
                data = wort12_lmm2_melt,
                REML = FALSE)

mod8 <- lmer(formula = log(value)
             ~ passung
             + (1 + satz_typ | vp)
             + (1 + satz_typ | kontext),
             data = wort12_lmm2_melt,
             REML = FALSE)

1.2. Parametric Bootstrap:
--------------------------

# remove NAs otherwise PBmodcomp() will complain as it calls the simulate()
# function directly without the argument na.action = na.exclude
# Relevant: See Ben Bolkers comments on this:
# https://github.com/lme4/lme4/issues/189 dirty fix:
mod8_nona    <- mod8
mod_min_nona <- mod_min

attr(x     = mod8_nona at frame, 
     which = "na.action") <- NULL # removing na.action attribute from the model

attr(x     = mod_min_nona at frame,
     which = "na.action") <- NULL # removing na.action attribute from the model

library(pbkrtest)

pbmod_mod8_mod_min <- PBmodcomp(largeModel = mod8_nona,
                                smallModel = mod_min_nona,
                                nsim       = 10000,
                                cl         = cl)

Take home message PBmodcomp(): p.value = 0.003

1.3 Notable:
------------
na.action attribute has been removed from both models before simulate() call.

Method 2.
=========

2.1 lmer() model calls:
-----------------------

identical to Method 1 (pbkrtest).

2.2 Parametric Bootstrap:
-------------------------

mod <- mod8
modnull <- mod_min
# save the observed likelihood ratio test statistic
lrt_obs <- anova(modnull,
                 mod)$Chisq[2]

n_sim <- 10000

dattemp <- mod at frame

ptime <- system.time({
    lrt_sim <- foreach (i = 1:n_sim,
                        .combine  = "c",
                        .packages = "lme4") %dopar% {

        ysim       <- unlist(x = simulate(object = modnull))

        modnullsim <- lmer(formula = ysim
                           ~ 1
                           + (1 + satz_typ | vp)
                           + (1 + satz_typ | kontext),
                           data = dattemp,
                           REML = FALSE) # Fit the null model.

        modaltsim  <- lmer(formula = ysim
                           ~ passung
                           + (1 + satz_typ | vp)
                           + (1 + satz_typ | kontext),
                           data = dattemp,
                           REML = FALSE) # Fit the alternative model.

        anova(modnullsim,
              modaltsim)$Chisq[2] # Save the likelihood ratio test statistic.

    }
})[3]

sum(lrt_sim >= lrt_obs) + 1)/(n_sim + 1) = 0.00639936 # p.value

Take home message Method 2: p.value = 0.00639936

2.3 Notable:
------------
na.action has not been altered!

Method 3.
=========
3.1 lmer() model calls:
-----------------------

# differences explained in 3.3
mod_min <- lmer(formula = log(value)
                ~ 1
                + (1 + satz_typ | vp)
                + (1 + satz_typ | kontext),
                data = wort12_lmm2_melt_noa,
                REML = FALSE)

mod8 <- lmer(formula = log(value)
             ~ passung
             + (1 + satz_typ | vp)
             + (1 + satz_typ | kontext),
             data = wort12_lmm2_melt_nona,
             REML = FALSE)

3.2 Parametric Bootstrap:
-------------------------

indentical to Method 2

Take home message Method 3: p.value = 0.00659934

3.3 Notable:
------------

The models were fit on the original dataset which only included complete cases.
That is, on a data set that was altered by the following code:

wort12_lmm2_melt_nona <-
    wort12_lmm2_melt[complete.cases(wort12_lmm2_melt), ]

Method 4.
=========
4.1 lmer() model calls:
-----------------------

# differences explained in 4.3
mod_min_naexclude <- lmer(formula = log(value)
                          ~ 1
                          + (1 + satz_typ | vp)
                          + (1 + satz_typ | kontext),
                          data      = wort12_lmm2_melt,
                          REML      = FALSE,
                          na.action = na.exclude)

mod8_naexclude <- lmer(formula = log(value)
                       ~ passung
                       + (1 + satz_typ | vp)
                       + (1 + satz_typ | kontext),
                       data      = wort12_lmm2_melt,
                       REML      = FALSE,
                       na.action = na.exclude)

4.2 Parametric Bootstrap:
-------------------------
mod <- mod8_naexclude
modnull <- mod_min_naexclude

lrt_obs <- anova(modnull,
                 mod)$Chisq[2] # Save the observed likelihood ratio test statistic.

n_sim <- 10000

ptime <- system.time({
    lrt_sim <- foreach (i = 1:n_sim,
                        .combine   = "c",
                        .packages = "lme4") %dopar% {

        ysim       <- unlist(simulate(object = modnull))

        modnullsim <- refit(object  = modnull,
                            newresp = ysim) # Fit the null-model.

        modaltsim  <- refit(object  = mod,
                            newresp = ysim) # Fit the alternative model.

        anova(modnullsim,
              modaltsim)$Chisq[2] # Save the likelihood ratio test statistic.

    }
})[3]

Take home message Method 4: p.value = 0.00429957

4.3 Notable:
------------
The models were fit with the option na.action = na.exclude.

Method 5.
=========

5.1 lmer() model calls:
-----------------------

# differences explained in 5.3
mod_min_naexclude <- lmer(formula = log(value)
                          ~ 1
                          + (1 + satz_typ | vp)
                          + (1 + satz_typ | kontext),
                          data      = wort12_lmm2_melt_nona,
                          REML      = FALSE)

mod8_naexclude <- lmer(formula = log(value)
                       ~ passung
                       + (1 + satz_typ | vp)
                       + (1 + satz_typ | kontext),
                       data      = wort12_lmm2_melt_nona,
                       REML      = FALSE)

5.2 Parametric Bootstrap:
-------------------------

indentical to Method 4

Take home message Method 5: p.value = 0.0039996

5.3 Notable:
------------
The models were fit on the original dataset which only included complete cases.
That is, on a data set that was altered by the following code:

wort12_lmm2_melt_nona <-
    wort12_lmm2_melt[complete.cases(wort12_lmm2_melt), ]

Question:
=========

Why do Method 2 and Method 3 yield such (in my opinion) p.values that deviate
by about 0.002 from all other methods?

(I think at least the explanation of why I do not see it is really simple: I
just have been looking to long at this code and I am missing the obvious.)

Two final remarks:

(1) I ran everything on a cluster. That’s why I repeated Methods 2 and Methods
3. The results they yield seem to be numerically stable.

(2) I plotted the likelihood ratio test statistics for all methods and compared
them to a chi-squared distribution with df=1, and their likelihood ratio test
density. Everything is normal. The chi-square assumption is correct for these
models

Thank you so much for your patience!

Best,
Christian Brauner