[R-sig-ME] GLMER vs. MCMCglmm - difference in model results - Poisson model with offset term

Fri May 24 10:51:26 CEST 2019

Dear everyone,

I have a question regarding the comparison of Generalized Linear Mixed 
Models (GLMM - using the lme4 - package) and Markov-Chain Monte-Carlo 
Generalized Mixed Models (MCMCglmm - using the MCMCglmm-package).

I am running mixed models using both packages. While the estimates 
provided by each approach are relatively comparable, the significance 
levels differ strongly. My main questions are the following:

1. How can the divergence in results be explained and why doe both 
methods provide different results even though the model structure is the 
same? (Note: diagnostic plots for both models look fine)

2. I am aware that there is strong debate about the validity of p-values 
within GLMMs. Is there similar concern regarding MCMCglmm?

3. What would be a sensible interpretation of both model outputs 
especially with respect to PRED2 and the interaction term.

Below I provide a short description and example output:

I want to test the effect of 3 continuous predictors (PRED2, PRED3, 
PRED4) on a proportional response (RESP). PRED2 and PRED3 enter the 
model as an interaction effect (PRED2*PRED3). The response is modeled 
using a Poisson error structure with a log-link function and the 
nominator of the ratio is therefore included as an offset predictor 
(offset(log(PRED1))).

MODEL1 -  GLMM using the lme4-package:

MOD1 <- glmer(RESP ~ offset(log(PRED1)) + PRED2 * PRED3 + PRED4 + 
(1|RANEF), family = "poisson", data = data, 
control=glmerControl(optimizer="bobyqa", optCtrl=list(maxfun=100000)))

summary(MOD1)
Generalized linear mixed model fit by maximum likelihood (Laplace 
Approximation) ['glmerMod']
  Family: poisson  ( log )
Formula: RESP ~ offset(log(PRED1)) + PRED2 * PRED3 + PRED4 + (1 | RANEF)
    Data: data
Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+05))

      AIC      BIC   logLik deviance df.resid
   2451.3   2470.9  -1219.6   2439.3      190

Scaled residuals:
      Min       1Q   Median       3Q      Max
-10.8175  -1.2477  -0.1913   1.1076  11.0389

Random effects:
  Groups Name        Variance Std.Dev.
  RANEF  (Intercept) 0.1519   0.3898
Number of obs: 196, groups:  Order, 56

Fixed effects:
                   Estimate    Std. Error z value Pr(>|z|)
(Intercept)       -3.29896    0.06250    -52.784   < 2e-16 ***
PRED2              0.09089    0.01850     4.912     9.00e-07 ***
PRED3              0.57262    0.02375    24.114    < 2e-16 ***
PRED4              0.54726    0.01808    30.275    < 2e-16 ***
PRED2:PRED3        0.12190    0.01821     6.693     2.19e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
               (Intr)    PRED2     PRED3    PRED4
PRED2         -0.142
PRED3         -0.056    0.102
PRED4         -0.042    0.135    -0.155
PRED2:PRED3    0.001    0.017    -0.678     0.273

MODEL2 -  MCMCglmm using the lme4-package:

Now I build the same model using a MCMCglmm using weak informative 
priors. The offset term was implemented following the suggestion by 
Jarrod Hadfield in this post 
(https://stat.ethz.ch/pipermail/r-sig-mixed-models/2017q3/026016.html). 
See the model formula and summary below:

prior <- list(B=list(V=diag(6)*10, mu=c(0,1,0,0,0,0)),
                          G =list(G1=list(V=1,nu=0.002)),
                          R=list(V=1,nu=0.002))
prior$B$V[2,2] <- 1e-9 # set fixed effect prior for the offset variable to 1

PGLMM.MOD1 <- MCMCglmm(RESP ~ log(PRED1) + PRED2 * PRED3 + PRED4,
                                         random = ~ RANEF,
                                         family="poisson",
                                         prior = prior,
                                         scale = T,
                                         pr=TRUE,
                                         data = data,
                                         nitt = 500000, burnin = 20000, 
thin = 200)

summary(PGLMM.MOD1)

Iterations = 20001:499801
  Thinning interval  = 200
  Sample size  = 2400

  DIC: 1164.591

  G-structure:  ~Order

         post.mean  l-95% CI    u-95% CI   eff.samp
RANEF   0.02264    0.0002387   0.0762      2400

  R-structure:  ~units

          post.mean    l-95% CI     u-95% CI     eff.samp
units    0.2788       0.1946       0.3705        2400

  Location effects: RESP ~ log(PRED1) + PRED2 * PRED3 + PRED4

              post.mean   l-95% CI  u-95% CI   eff.samp   pMCMC
(Intercept)  -3.40280   -3.50958   -3.28826   2400       <4e-04 ***
log(PRED1)   1.00000    0.99994    1.00006    2257       <4e-04 ***
PRED2        0.08369    -0.01995   0.18120    2400       0.113
PRED3        0.63058    0.52164    0.73889    2400       <4e-04 ***
PRED4        0.60488    0.50565    0.69963    2400       <4e-04 ***
PRED2:PRED3  0.06882    -0.02363   0.17414    2258       0.167
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

As you can see from the different summaries, the estimates are fairly 
similar with some divergence especially regarding PRED2 and the 
interaction term. Additionally the provided p-values differ 
dramatically, with the GLMM (lme4) being highly significant for PRED2 
and the interaction wheras non-significance for the MCMCglmm.

My questions now are:

1. How can the divergence in results be explained and why doe both 
methods provide different results even though the model structure is the 
same? (Note: diagnostic plots for both models look fine)

2. I am aware that there is strong debate about the validity of p-values 
within GLMMs. Is there similar concern regarding MCMCglmm?

3. What would be a sensible interpretation of both model outputs 
especially with respect to PRED2 and the interaction term.

Any insight and help on that matter would be highly appreciated.

Best regards,

Bernd

-- 
Bernd Lenzner PhD

Researcher
University of Vienna
Department of Botany and Biodiversity Research
Division of Conservation Biology, Vegetation and Landscape Ecology
Rennweg 14
A-1030 Vienna