[R-sig-ME] glmer conditional deviance DECREASING after removal of fixed effect??

Thu Aug 31 15:45:10 CEST 2023

I have written about how I now understand the evaluation of the likelihood
of a GLMM in an appendix of the in-progress book at
https://juliamixedmodels.github.io/EmbraceUncertainty

It is a difficult area to make sense of.  Even now, after more than 25
years of thinking about these models, I am still not sure exactly how to
define the objective in a GLMM for a family with a dispersion parameter.

On Thu, Aug 31, 2023 at 8:37 AM Douglas Bates <dmbates using gmail.com> wrote:

> I may have attributed the confusion about conditional and marginal
> deviances to you, Juho, when in fact it is the fault of the authors of
> lme4, of whom I am one.  If so, I apologize.
>
> We (the authors of lme4) had a discussion about the definition of deviance
> many years ago and I am not sure what the upshot was.  It may have been
> that we went with this incorrect definition of the "conditional deviance".
> Unfortunately my R skills are sufficiently atrophied that I haven't been
> able to look up what is actually evaluated.
>
> The area of generalized linear models and generalized linear mixed models
> holds many traps for the unwary in terms of definitions, and some casual
> definitions in the original glm function, going as far back as the S3
> language, aid in the confusion.  For example, the dev.resid function in a
> GLM family doesn't return the deviance residuals - it returns the square of
> the deviance residuals, which should be named the "unit deviances".
>
> Given a response vector and the predicted mean vector and a distribution
> family one can define the unit deviances, which play a role similar to the
> squared residual in the Gaussian family.  The sum of these unit deviances
> is the deviance of a generalized linear model, but it is not the deviance
> of a generalized linear mixed model, because the likelihood for a GLMM
> involves both a "fidelity to the data" term *and* a "complexity of the
> model" term, which is the size of the random effects vector in a certain
> metric.
>
> So if indeed we used an inappropriate definition of deviance the mistake
> is ours and we should correct it.
>
> On Thu, Aug 31, 2023 at 7:56 AM Douglas Bates <dmbates using gmail.com> wrote:
>
>> You say you are comparing conditional deviances rather than marginal
>> deviances.  Can you expand on that a bit further?  How are you evaluating
>> the conditional deviances?
>>
>> The models are being fit according to what you describe as the marginal
>> deviance - what I would call the deviance.  It is not surprising that what
>> you are calling the conditional deviance is inconsistent with the nesting
>> of the models because they weren't fit according to that criterion.
>>
>> julia> m01 = let f = @formula y ~ 1 + x1 + x2 + x3 + x4 + (1|id)
>>            fit(MixedModel, f, dat, Bernoulli(); contrasts, nAGQ=9)
>>        end
>> Minimizing 141   Time: 0:00:00 ( 1.22 ms/it)
>> Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 9)
>>   y ~ 1 + x1 + x2 + x3 + x4 + (1 | id)
>>   Distribution: Bernoulli{Float64}
>>   Link: LogitLink()
>>
>>    logLik    deviance     AIC       AICc        BIC
>>  -1450.8163  2900.8511  2915.6326  2915.6833  2955.5600
>>
>> Variance components:
>>       Column   Variance Std.Dev.
>> id (Intercept)  0.270823 0.520406
>>
>>  Number of obs: 2217; levels of grouping factors: 331
>>
>> Fixed-effects parameters:
>> ────────────────────────────────────────────────────
>>                   Coef.  Std. Error      z  Pr(>|z|)
>> ────────────────────────────────────────────────────
>> (Intercept)   0.0969256   0.140211    0.69    0.4894
>> x1: 1        -0.0449824   0.0553361  -0.81    0.4163
>> x2            0.0744891   0.0133743   5.57    <1e-07
>> x3           -0.548392    0.0914109  -6.00    <1e-08
>> x4: B         0.390359    0.0803063   4.86    <1e-05
>> x4: C         0.299932    0.0991249   3.03    0.0025
>> ────────────────────────────────────────────────────
>>
>> julia> m02 = let f = @formula y ~ 1 + x1 + x2 + x3 + (1|id)
>>            fit(MixedModel, f, dat, Bernoulli(); contrasts, nAGQ=9)
>>        end
>> Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 9)
>>   y ~ 1 + x1 + x2 + x3 + (1 | id)
>>   Distribution: Bernoulli{Float64}
>>   Link: LogitLink()
>>
>>    logLik    deviance     AIC       AICc        BIC
>>  -1472.4551  2944.0654  2954.9102  2954.9373  2983.4297
>>
>> Variance components:
>>       Column   Variance Std.Dev.
>> id (Intercept)  0.274331 0.523766
>>
>>  Number of obs: 2217; levels of grouping factors: 331
>>
>> Fixed-effects parameters:
>> ────────────────────────────────────────────────────
>>                   Coef.  Std. Error      z  Pr(>|z|)
>> ────────────────────────────────────────────────────
>> (Intercept)  -0.448496    0.100915   -4.44    <1e-05
>> x1: 1        -0.0527684   0.0547746  -0.96    0.3354
>> x2            0.0694393   0.0131541   5.28    <1e-06
>> x3           -0.556903    0.0904798  -6.15    <1e-09
>> ────────────────────────────────────────────────────
>>
>> julia> MixedModels.likelihoodratiotest(m02, m01)
>> Model Formulae
>> 1: y ~ 1 + x1 + x2 + x3 + (1 | id)
>> 2: y ~ 1 + x1 + x2 + x3 + x4 + (1 | id)
>> ──────────────────────────────────────────────────
>>      model-dof   deviance       χ²  χ²-dof  P(>χ²)
>> ──────────────────────────────────────────────────
>> [1]          5  2944.0654
>> [2]          7  2900.8511  43.2144       2  <1e-09
>> ──────────────────────────────────────────────────
>>
>> I would note that your data are so imbalanced with respect to id that it
>> is not surprising that you get unstable results.  (I changed your id column
>> from integers to a factor so that 33 becomes S033.)
>>
>> 331×2 DataFrame
>>  Row │ id      nrow
>>      │ String  Int64
>> ─────┼───────────────
>>    1 │ S033     1227
>>    2 │ S134       46
>>    3 │ S295       45
>>    4 │ S127       41
>>    5 │ S125       33
>>    6 │ S228       31
>>    7 │ S193       23
>>    8 │ S064       18
>>    9 │ S281       16
>>   10 │ S055       13
>>   11 │ S035       13
>>   12 │ S091       12
>>   13 │ S175       11
>>   14 │ S284       10
>>   15 │ S159       10
>>   ⋮  │   ⋮       ⋮
>>  317 │ S324        1
>>  318 │ S115        1
>>  319 │ S192        1
>>  320 │ S201        1
>>  321 │ S156        1
>>  322 │ S202        1
>>  323 │ S067        1
>>  324 │ S264        1
>>  325 │ S023        1
>>  326 │ S090        1
>>  327 │ S195        1
>>  328 │ S170        1
>>  329 │ S241        1
>>  330 │ S189        1
>>  331 │ S213        1
>>      301 rows omitted
>>
>>
>>
>> On Thu, Aug 31, 2023 at 3:02 AM Juho Kristian Ruohonen <
>> juho.kristian.ruohonen using gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I thought it was impossible for deviance to decrease when a term is
>>> removed!?!? Yet I'm seeing it happen with this pair of relatively simple
>>> Bernoulli GLMMs fit using lme4:glmer():
>>>
>>> > full <- glmer(y ~ (1|id) + x1 + x2 + x3 + x4, family = binomial, nAGQ =
>>> > 6, data = anon)
>>>
>>> > reduced <- update(full, ~. -x1)
>>>
>>> > c(full = deviance(full), reduced = deviance(reduced))
>>>
>>>
>>> *     full  reduced*
>>> *2808.671 2807.374 *
>>>
>>> What on earth going on? FYI, I am deliberately comparing conditional
>>> deviances rather than marginal ones, because quite a few of the clusters
>>> are of inherent interest and likely to recur in future data.
>>>
>>> My anonymized datafile is attached.
>>>
>>> Best,
>>>
>>> Juho
>>> _______________________________________________
>>> R-sig-mixed-models using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>

	[[alternative HTML version deleted]]