[R-sig-ME] nAGQ = 0

Phillip Alday phillip.alday at mpi.nl
Mon Sep 4 11:34:57 CEST 2017

Doug Bates has posted some discussion on this as part of his newest book
project on mixed models (this time in Julia):


I'll try to summarize. The difference seems to be

- (nAGQ=0, fast=true) the fixed effects (beta) are estimated with the
conditional modes (b) of the random effects as part of the penalized
iterative least squares (PIRLS) step. Only the covariance parameters
(theta) are estimated during the nonlinear optimization step.


- (nAGQ=1, fast=slow) the fixed effect estimates are further optimized
with the covariance parameters as part of the nonlinear optimization
step i.e the Laplace approximation to the deviance.

Some software supports nAGQ > 1, in which case we have a better but more
numerically intensive approximation to the optimization; the Laplace
approximation is just a special case of Gaussian quadrature with one
quadrature point. I think lme4/MixedModels.jl supports this only for
single scalar (intercept-only) random effect.

As John pointed out, there doesn't seem to be other software that is
willing to omit the fixed effects from quadrature procedure as
lme4/MixedModels.jl does for nAGQ=0. For lme4/MixedModels.jl, they are
used as the starting values for the Laplace approximation, so it's
'free' to expose them to the user.

(If I recall correctly, you get the conditional modes for "free" in the
LMM case, at least for the numerical procedure used in
lme4/MixedModels.jl. The lme4 paper has more details on this, if you're
up for the math.)

Having additional parameters in the nonlinear optimization step of
course increases the dimensionality of that step and slows it down a
bit. However, estimating the fixed effects with the full covariance
parameters will often yield a better fit and will affect error estimates
and thus t-/z-values. I'm not sure how much it will affect the
coefficient estimates for 'well-behaved' datasets -- my guess is that
computing the fixed effects with the covariance parameters will affect
the relative amount of shrinkage amongst the fixed effects, increasing
it for some, decreasing it for others.

It's interesting to note that the conditional mode estimates are the
same in both cases because they are estimated first.


On 09/04/2017 12:59 AM, Poe, John wrote:
> Oh and on the model comparison thing the only way i know how to directly
> compare random effects distribution estimates is with gateaux derivatives a
> la what Sophia Rabe-Hesketh did in her GLLAMM software for nonparametric
> random effects estimation.
> Usually i just kind of eyeball it and try to be overly conservative with
> quadrature points or use MCMC.
> I guess you could rig up some regular deviance test to do the same thing?
> On Sep 3, 2017 6:51 PM, "Poe, John" <jdpo223 at g.uky.edu> wrote:
>> I'm pretty sure that nAGQ=0 is generating conditional modes of group
>> values for the random effects without subsequently using a laplace
>> approximation. This is really not something that you want to do unless it's
>> a last resort. It's kind of like estimating a linear probability model with
>> a random intercept and using the values for that in the cloglog model. My
>> understanding is that it's not even an option in most other software.
>> Someone please correct me if I'm wrong here because what I've found on it
>> has been kind of vague and I'm making some assumptions.
>> My guess is that the random intercepts/slopes are going to be too small
>> and their distributions could be distorted if you're not actually
>> approximating the integral with something like quadrature or mcmc. That's
>> the case with PQL and MQL according to simulation evidence at least and
>> even though this isn't the same as PQL I'd expect a similar problem at
>> first blush.
>> As to if this is a real practical problem or a theory  problem there's
>> been kind of a disagreement on that within the stats literature. Some
>> people argue, in binary outcome models, that biased random intercepts can
>> bias everything else and others have argued this fear is overblown. This
>> might well have been settled by actual statisticians by now, I'm not sure.
>> It's gotten enough attention in the literature that people certainly
>> worried about it a lot.
>> Below are two articles on the topic with simulations but I've seen the
>> fixed effects results (and LR tests) change based on the random effects
>> approximation technique in my work so I'm always a bit paranoid about it.
>> Model misspecification and having oddly shaped random intercepts (as with
>> count models) can seem to make this problem worse.
>> You can try using the BRMS package if you aren't comfortable switching to
>> something totally unfamiliar. It's a wrapper for Stan designed to use lme4
>> syntax and a lot of good default settings. It's pretty easy to use if you
>> know lme4 syntax and can read up on mcmc diagnostics.
>> Litière, S., et al. (2008). "The impact of a misspecified random‐effects
>> distribution on the estimation and the performance of inferential
>> procedures in generalized linear mixed models." Stat Med 27(16): 3125-3144.
>> McCulloch, C. E. and J. M. Neuhaus (2011). "Misspecifying the shape of a
>> random effects distribution: why getting it wrong may not matter."
>> Statistical Science: 388-402.
>> On Sep 3, 2017 5:49 PM, "Rolf Turner" <r.turner at auckland.ac.nz> wrote:
>>> On 04/09/17 03:48, Jonathan Judge wrote:
>>>> Rolf:
>>>> I have not studied this extensively with smaller datasets, but with
>>>> larger datasets --- five-figure and especially six-figure n --- I have
>>>> found that it often makes no difference.
>>> When uncertain, I have used a likelihood ratio test to see if the
>>>> differences are likely to be material.
>>> My overall suggestion would be that if the dataset is small enough
>>>> for  this choice to matter, it is probably also small enough to solve the
>>>> model through MCMC, in which case I would recommending using that,
>>>> because the incorporated uncertainty often gives you better parameter
>>>> estimates than any increased level of quadrature.
>>> Thanks Jonathan.
>>> (a) How small is "small"?  I have 3 figure n's. I am currently mucking
>>> about with two data sets.  One has 952 observations (with 22 treatment
>>> groups, 3 random effect reps per group).  The other has 142 observations
>>> (with 6 treatment groups and again 3 reps per group).  Would you call the
>>> latter data set small?
>>> (b) I've never had the courage to try the MCMC approaches to mixed
>>> models; have just used lme4.  I guess it's time that I bit the bullet.
>>> Psigh.  This is going to take me a while.  As an old dog I *can* learn
>>> new tricks, but I learn them *slowly*. :-)
>>> (c) In respect of the likelihood ratio test that you suggest --- sorry to
>>> be a thicko, but I don't get it.  It seems to me that one is fitting the
>>> *same model* in both instances, so the "degrees of freedom" for such a test
>>> would be zero.  What am I missing?
>>> Thanks again.
>>> cheers,
>>> Rolf
>>> --
>>> Technical Editor ANZJS
>>> Department of Statistics
>>> University of Auckland
>>> Phone: +64-9-373-7599 ext. 88276
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 	[[alternative HTML version deleted]]
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

More information about the R-sig-mixed-models mailing list