[R-sig-ME] glmer conditional deviance DECREASING after removal of fixed effect??

Thu Aug 31 16:08:45 CEST 2023

I think this highlights the intersection of two big stumbling blocks:

1. a lot of identities / niceness in classical GLM theory doesn't hold 
in the mixed models case. See also coefficient of determination, clear 
notions of denominator degrees of freedom for F tests, etc.

2. a lot of R and S functionality around GLMs is based on convenient 
computational quantities in classical GLM theory. This leads to some 
infelicity in naming, which is further confused by (1) for mixed models.

I think this highlights a real challenge to many users, also because of 
the way we teach statistics. (I'm teaching a course soon, can everyone 
tell?) Many intro statistics courses state things like "the coefficient 
of determination, R^2, is the squared Pearson correlation, r, and can be 
interpreted as ....". But we don't highlight that that's an identity for 
simple linear regression and not the formal definition of the 
coefficient of determination. We also don't emphasize how many of these 
definitions hinge on concepts that don't have an obvious extension to 
multi-level models or perhaps none of the obvious extensions hold all 
the properties of the classical GLM form. (Most of this is discussed in 
the GLMM FAQ!)

This is a criticism of myself as an instructor as much as anything, but 
it's something I like to emphasize in teaching nowadays and highlight as 
a point 're-education' for students who've had a more traditional education.

Phillip

On 8/31/23 08:45, Douglas Bates wrote:
> I have written about how I now understand the evaluation of the likelihood
> of a GLMM in an appendix of the in-progress book at
> https://juliamixedmodels.github.io/EmbraceUncertainty
>
> It is a difficult area to make sense of.  Even now, after more than 25
> years of thinking about these models, I am still not sure exactly how to
> define the objective in a GLMM for a family with a dispersion parameter.
>
> On Thu, Aug 31, 2023 at 8:37 AM Douglas Bates <dmbates using gmail.com> wrote:
>
>> I may have attributed the confusion about conditional and marginal
>> deviances to you, Juho, when in fact it is the fault of the authors of
>> lme4, of whom I am one.  If so, I apologize.
>>
>> We (the authors of lme4) had a discussion about the definition of deviance
>> many years ago and I am not sure what the upshot was.  It may have been
>> that we went with this incorrect definition of the "conditional deviance".
>> Unfortunately my R skills are sufficiently atrophied that I haven't been
>> able to look up what is actually evaluated.
>>
>> The area of generalized linear models and generalized linear mixed models
>> holds many traps for the unwary in terms of definitions, and some casual
>> definitions in the original glm function, going as far back as the S3
>> language, aid in the confusion.  For example, the dev.resid function in a
>> GLM family doesn't return the deviance residuals - it returns the square of
>> the deviance residuals, which should be named the "unit deviances".
>>
>> Given a response vector and the predicted mean vector and a distribution
>> family one can define the unit deviances, which play a role similar to the
>> squared residual in the Gaussian family.  The sum of these unit deviances
>> is the deviance of a generalized linear model, but it is not the deviance
>> of a generalized linear mixed model, because the likelihood for a GLMM
>> involves both a "fidelity to the data" term *and* a "complexity of the
>> model" term, which is the size of the random effects vector in a certain
>> metric.
>>
>> So if indeed we used an inappropriate definition of deviance the mistake
>> is ours and we should correct it.
>>
>> On Thu, Aug 31, 2023 at 7:56 AM Douglas Bates <dmbates using gmail.com> wrote:
>>
>>> You say you are comparing conditional deviances rather than marginal
>>> deviances.  Can you expand on that a bit further?  How are you evaluating
>>> the conditional deviances?
>>>
>>> The models are being fit according to what you describe as the marginal
>>> deviance - what I would call the deviance.  It is not surprising that what
>>> you are calling the conditional deviance is inconsistent with the nesting
>>> of the models because they weren't fit according to that criterion.
>>>
>>> julia> m01 = let f = @formula y ~ 1 + x1 + x2 + x3 + x4 + (1|id)
>>>             fit(MixedModel, f, dat, Bernoulli(); contrasts, nAGQ=9)
>>>         end
>>> Minimizing 141   Time: 0:00:00 ( 1.22 ms/it)
>>> Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 9)
>>>    y ~ 1 + x1 + x2 + x3 + x4 + (1 | id)
>>>    Distribution: Bernoulli{Float64}
>>>    Link: LogitLink()
>>>
>>>     logLik    deviance     AIC       AICc        BIC
>>>   -1450.8163  2900.8511  2915.6326  2915.6833  2955.5600
>>>
>>> Variance components:
>>>        Column   Variance Std.Dev.
>>> id (Intercept)  0.270823 0.520406
>>>
>>>   Number of obs: 2217; levels of grouping factors: 331
>>>
>>> Fixed-effects parameters:
>>> ────────────────────────────────────────────────────
>>>                    Coef.  Std. Error      z  Pr(>|z|)
>>> ────────────────────────────────────────────────────
>>> (Intercept)   0.0969256   0.140211    0.69    0.4894
>>> x1: 1        -0.0449824   0.0553361  -0.81    0.4163
>>> x2            0.0744891   0.0133743   5.57    <1e-07
>>> x3           -0.548392    0.0914109  -6.00    <1e-08
>>> x4: B         0.390359    0.0803063   4.86    <1e-05
>>> x4: C         0.299932    0.0991249   3.03    0.0025
>>> ────────────────────────────────────────────────────
>>>
>>> julia> m02 = let f = @formula y ~ 1 + x1 + x2 + x3 + (1|id)
>>>             fit(MixedModel, f, dat, Bernoulli(); contrasts, nAGQ=9)
>>>         end
>>> Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 9)
>>>    y ~ 1 + x1 + x2 + x3 + (1 | id)
>>>    Distribution: Bernoulli{Float64}
>>>    Link: LogitLink()
>>>
>>>     logLik    deviance     AIC       AICc        BIC
>>>   -1472.4551  2944.0654  2954.9102  2954.9373  2983.4297
>>>
>>> Variance components:
>>>        Column   Variance Std.Dev.
>>> id (Intercept)  0.274331 0.523766
>>>
>>>   Number of obs: 2217; levels of grouping factors: 331
>>>
>>> Fixed-effects parameters:
>>> ────────────────────────────────────────────────────
>>>                    Coef.  Std. Error      z  Pr(>|z|)
>>> ────────────────────────────────────────────────────
>>> (Intercept)  -0.448496    0.100915   -4.44    <1e-05
>>> x1: 1        -0.0527684   0.0547746  -0.96    0.3354
>>> x2            0.0694393   0.0131541   5.28    <1e-06
>>> x3           -0.556903    0.0904798  -6.15    <1e-09
>>> ────────────────────────────────────────────────────
>>>
>>> julia> MixedModels.likelihoodratiotest(m02, m01)
>>> Model Formulae
>>> 1: y ~ 1 + x1 + x2 + x3 + (1 | id)
>>> 2: y ~ 1 + x1 + x2 + x3 + x4 + (1 | id)
>>> ──────────────────────────────────────────────────
>>>       model-dof   deviance       χ²  χ²-dof  P(>χ²)
>>> ──────────────────────────────────────────────────
>>> [1]          5  2944.0654
>>> [2]          7  2900.8511  43.2144       2  <1e-09
>>> ──────────────────────────────────────────────────
>>>
>>> I would note that your data are so imbalanced with respect to id that it
>>> is not surprising that you get unstable results.  (I changed your id column
>>> from integers to a factor so that 33 becomes S033.)
>>>
>>> 331×2 DataFrame
>>>   Row │ id      nrow
>>>       │ String  Int64
>>> ─────┼───────────────
>>>     1 │ S033     1227
>>>     2 │ S134       46
>>>     3 │ S295       45
>>>     4 │ S127       41
>>>     5 │ S125       33
>>>     6 │ S228       31
>>>     7 │ S193       23
>>>     8 │ S064       18
>>>     9 │ S281       16
>>>    10 │ S055       13
>>>    11 │ S035       13
>>>    12 │ S091       12
>>>    13 │ S175       11
>>>    14 │ S284       10
>>>    15 │ S159       10
>>>    ⋮  │   ⋮       ⋮
>>>   317 │ S324        1
>>>   318 │ S115        1
>>>   319 │ S192        1
>>>   320 │ S201        1
>>>   321 │ S156        1
>>>   322 │ S202        1
>>>   323 │ S067        1
>>>   324 │ S264        1
>>>   325 │ S023        1
>>>   326 │ S090        1
>>>   327 │ S195        1
>>>   328 │ S170        1
>>>   329 │ S241        1
>>>   330 │ S189        1
>>>   331 │ S213        1
>>>       301 rows omitted
>>>
>>>
>>>
>>> On Thu, Aug 31, 2023 at 3:02 AM Juho Kristian Ruohonen <
>>> juho.kristian.ruohonen using gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I thought it was impossible for deviance to decrease when a term is
>>>> removed!?!? Yet I'm seeing it happen with this pair of relatively simple
>>>> Bernoulli GLMMs fit using lme4:glmer():
>>>>
>>>>> full <- glmer(y ~ (1|id) + x1 + x2 + x3 + x4, family = binomial, nAGQ =
>>>>> 6, data = anon)
>>>>> reduced <- update(full, ~. -x1)
>>>>> c(full = deviance(full), reduced = deviance(reduced))
>>>>
>>>> *     full  reduced*
>>>> *2808.671 2807.374 *
>>>>
>>>> What on earth going on? FYI, I am deliberately comparing conditional
>>>> deviances rather than marginal ones, because quite a few of the clusters
>>>> are of inherent interest and likely to recur in future data.
>>>>
>>>> My anonymized datafile is attached.
>>>>
>>>> Best,
>>>>
>>>> Juho
>>>> _______________________________________________
>>>> R-sig-mixed-models using r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models