[R-sig-ME] nAGQ > 1 in lme4::glmer gives unexpected likelihood

Sat Apr 25 00:31:33 CEST 2020

I modified Ben Goldstein's code slightly to show that it's indeed the nAGQ log-likelihood that's off by a constant, rather than the Laplace.
While for model selection reasons one can argue that the constant doesn't matter, it is good to know that this distinction exists.
I'm not quite sure if/why the right log-likelihood could be computed; I could see this as needed/useful for comparison between different models, i.e. outside the Poisson or even GLME class.

Florin

#### Likelihood of Poisson Models

# r-sig-mixed-models GLMM question*
  library(lme4)
set.seed(51)

# Simulate some random effect-driven Poisson data*
# random_effects <- rnorm(10, 0, 2)
# random_effects <- rnorm(10, 0, 0.01)
random_effects <- rnorm(10, 0, 0.01)

group <- rep(1:10, 10)
simulated_data <- data.frame(y = rpois(n = 100, lambda = exp(3 +
                                                               random_effects[group])),
                             group = group)

# Fit models with Laplace (nAGQ = 1) and nAGQ = 11*
  fit_Laplace <- glmer(y ~ (1|group), data = simulated_data, family =
                         poisson())
fit_AGQ <- glmer(y ~ (1|group), data = simulated_data, family = poisson(),
                 nAGQ = 11)

fit_glm = glm(y ~ group, data=simulated_data, family=poisson)

(logLik(fit_Laplace))
(logLik(fit_AGQ))
(logLik(fit_glm)). # very similar to logLik(fit_Laplace)

> On Apr 24, 2020, at 7:59 AM, Ben Bolker <bbolker using gmail.com> wrote:
> 
> 
>    I found the note I was looking for.  In ?deviance.glmerMod it says:
> 
>  If adaptive Gauss-Hermite quadrature is used, then
>           ‘logLik(object)’ is currently only proportional to the
>           absolute-unconditional log-likelihood.
> 
> (see the discussion on this page for more context); see also https://github.com/lme4/lme4/blob/master/misc/logLikGLMM/logLikGLMM.R
> 
> 
> On 4/24/20 10:24 AM, Douglas Bates wrote:
>> Having said that, I do see that the fits in the MixedModels package for
>> Julia produce similar values of the deviance with the Laplace approximation
>> and nAGQ = 7
>> 
>> julia> m1 = fit(MixedModel, @formula(y ~ 1 + (1|group)), dd, Poisson())
>> Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 1)
>>   y ~ 1 + (1 | group)
>>   Distribution: Poisson{Float64}
>>   Link: LogLink()
>> 
>>   Deviance: 193.5587
>> 
>> Variance components:
>>          Column    Variance  Std.Dev.
>> group (Intercept)  3.9577026 1.9893975
>> 
>>  Number of obs: 100; levels of grouping factors: 10
>> 
>> Fixed-effects parameters:
>> ──────────────────────────────────────────────────
>>              Estimate  Std.Error  z value  P(>|z|)
>> ──────────────────────────────────────────────────
>> (Intercept)   2.65175   0.632317     4.19    <1e-4
>> ──────────────────────────────────────────────────
>> 
>> julia> m1 = fit(MixedModel, @formula(y ~ 1 + (1|group)), dd, Poisson(),
>> nAGQ=7)
>> Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 7)
>>   y ~ 1 + (1 | group)
>>   Distribution: Poisson{Float64}
>>   Link: LogLink()
>> 
>>   Deviance: 193.5104
>> 
>> Variance components:
>>          Column    Variance  Std.Dev.
>> group (Intercept)  3.9577026 1.9893975
>> 
>>  Number of obs: 100; levels of grouping factors: 10
>> 
>> Fixed-effects parameters:
>> ──────────────────────────────────────────────────
>>              Estimate  Std.Error  z value  P(>|z|)
>> ──────────────────────────────────────────────────
>> (Intercept)   2.65175   0.632317     4.19    <1e-4
>> ──────────────────────────────────────────────────
>> 
>> As the person who wrote the first version of the nAGQ code in R I would not
>> be surprised if there was a constant dropped somewhere.  It is difficult
>> code.
>> 
>> And the results here in the Julia package make me uncomfortable because the
>> values of the parameter estimates are identical in the two fits.  I would
>> expect them to be close but not identical.
>> 
>> Isn't it good to know that there is still room for research in this area?
>> :-)
>> 
>> On Fri, Apr 24, 2020 at 9:05 AM Douglas Bates <bates using stat.wisc.edu> wrote:
>> 
>>> There's a lot of variability in your lambdas
>>> 
>>>> exp(3 + random_effects)
>>>  [1]  91.5919358   6.9678749   4.1841478  78.0771666 890.6931394
>>> 20.8558107
>>>  [7]   3.0037864   0.3049416   2.1675995  40.6209684
>>> 
>>> Do you really expect that some groups will have a mean count of nearly 900
>>> whereas others will have a mean count less than 1?
>>> 
>>> 
>>> On Wed, Apr 22, 2020 at 5:58 PM Ben Goldstein <ben.goldstein using berkeley.edu>
>>> wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> I'm using lme4::glmer to estimate Poisson mixed models in a very simple
>>>> context (single random effect). I'm interested in the model
>>> likelihood/AIC
>>>> across many simulated datasets.
>>>> 
>>>> To investigate whether the Laplace approximation was appropriate for my
>>>> data context, I explored using the argument nAGQ to improve the accuracy
>>> of
>>>> the likelihood estimation. When I changed nAGQ to a value > 1, I saw an
>>>> unexpectedly huge change in the likelihood; log-likelihoods tended to be
>>>> off by ~200. Other statistics packages (e.g. GLMMadaptive) yield
>>> estimates
>>>> that agree with lme4's Laplace approximation, as did a manual likelihood
>>>> estimate, and not with the nAGQ > 2 estimate.
>>>> 
>>>> The following code reproduces the problem I'm encountering.
>>>> 
>>>> *# r-sig-mixed-models GLMM question*
>>>> library(lme4)
>>>> set.seed(51)
>>>> 
>>>> *# Simulate some random effect-driven Poisson data*
>>>> random_effects <- rnorm(10, 0, 2)
>>>> group <- rep(1:10, 10)
>>>> simulated_data <- data.frame(y = rpois(n = 100, lambda = exp(3 +
>>>> random_effects[group])),
>>>>                              group = group)
>>>> 
>>>> *# Fit models with Laplace (nAGQ = 1) and nAGQ = 11*
>>>> fit_Laplace <- glmer(y ~ (1|group), data = simulated_data, family =
>>>> poisson())
>>>> fit_AGQ <- glmer(y ~ (1|group), data = simulated_data, family =
>>> poisson(),
>>>> nAGQ = 11)
>>>> 
>>>> logLik(fit_Laplace)
>>>> logLik(fit_AGQ)
>>>> logLik(fit_Laplace) - logLik(fit_AGQ) *# Huge difference!*
>>>> 
>>>> When I execute the above code, I see a difference in likelihood of
>>>> -218.8894. I've tested across many simulations and on 2 different
>>> machines
>>>> (Mac and Linux). My version of lme4 is up to date.
>>>> 
>>>> Has anyone run into this issue before? Am I using the glmer function
>>> wrong,
>>>> or is it possible there's something going on under the hood?
>>>> 
>>>> Thanks,
>>>> Ben
>>>> 
>>>>         [[alternative HTML version deleted]]
>>>> 
>>>> _______________________________________________
>>>> R-sig-mixed-models using r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>> 
>>>         [[alternative HTML version deleted]]
>>> 
>>> _______________________________________________
>>> R-sig-mixed-models using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>> 
>> 	[[alternative HTML version deleted]]
>> 
>> _______________________________________________
>> R-sig-mixed-models using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models