[R-sig-ME] glmer conditional deviance DECREASING after removal of fixed effect??
Douglas Bates
dmb@te@ @end|ng |rom gm@||@com
Thu Aug 31 15:45:10 CEST 2023
I have written about how I now understand the evaluation of the likelihood
of a GLMM in an appendix of the in-progress book at
https://juliamixedmodels.github.io/EmbraceUncertainty
It is a difficult area to make sense of. Even now, after more than 25
years of thinking about these models, I am still not sure exactly how to
define the objective in a GLMM for a family with a dispersion parameter.
On Thu, Aug 31, 2023 at 8:37 AM Douglas Bates <dmbates using gmail.com> wrote:
> I may have attributed the confusion about conditional and marginal
> deviances to you, Juho, when in fact it is the fault of the authors of
> lme4, of whom I am one. If so, I apologize.
>
> We (the authors of lme4) had a discussion about the definition of deviance
> many years ago and I am not sure what the upshot was. It may have been
> that we went with this incorrect definition of the "conditional deviance".
> Unfortunately my R skills are sufficiently atrophied that I haven't been
> able to look up what is actually evaluated.
>
> The area of generalized linear models and generalized linear mixed models
> holds many traps for the unwary in terms of definitions, and some casual
> definitions in the original glm function, going as far back as the S3
> language, aid in the confusion. For example, the dev.resid function in a
> GLM family doesn't return the deviance residuals - it returns the square of
> the deviance residuals, which should be named the "unit deviances".
>
> Given a response vector and the predicted mean vector and a distribution
> family one can define the unit deviances, which play a role similar to the
> squared residual in the Gaussian family. The sum of these unit deviances
> is the deviance of a generalized linear model, but it is not the deviance
> of a generalized linear mixed model, because the likelihood for a GLMM
> involves both a "fidelity to the data" term *and* a "complexity of the
> model" term, which is the size of the random effects vector in a certain
> metric.
>
> So if indeed we used an inappropriate definition of deviance the mistake
> is ours and we should correct it.
>
> On Thu, Aug 31, 2023 at 7:56 AM Douglas Bates <dmbates using gmail.com> wrote:
>
>> You say you are comparing conditional deviances rather than marginal
>> deviances. Can you expand on that a bit further? How are you evaluating
>> the conditional deviances?
>>
>> The models are being fit according to what you describe as the marginal
>> deviance - what I would call the deviance. It is not surprising that what
>> you are calling the conditional deviance is inconsistent with the nesting
>> of the models because they weren't fit according to that criterion.
>>
>> julia> m01 = let f = @formula y ~ 1 + x1 + x2 + x3 + x4 + (1|id)
>> fit(MixedModel, f, dat, Bernoulli(); contrasts, nAGQ=9)
>> end
>> Minimizing 141 Time: 0:00:00 ( 1.22 ms/it)
>> Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 9)
>> y ~ 1 + x1 + x2 + x3 + x4 + (1 | id)
>> Distribution: Bernoulli{Float64}
>> Link: LogitLink()
>>
>> logLik deviance AIC AICc BIC
>> -1450.8163 2900.8511 2915.6326 2915.6833 2955.5600
>>
>> Variance components:
>> Column Variance Std.Dev.
>> id (Intercept) 0.270823 0.520406
>>
>> Number of obs: 2217; levels of grouping factors: 331
>>
>> Fixed-effects parameters:
>> ────────────────────────────────────────────────────
>> Coef. Std. Error z Pr(>|z|)
>> ────────────────────────────────────────────────────
>> (Intercept) 0.0969256 0.140211 0.69 0.4894
>> x1: 1 -0.0449824 0.0553361 -0.81 0.4163
>> x2 0.0744891 0.0133743 5.57 <1e-07
>> x3 -0.548392 0.0914109 -6.00 <1e-08
>> x4: B 0.390359 0.0803063 4.86 <1e-05
>> x4: C 0.299932 0.0991249 3.03 0.0025
>> ────────────────────────────────────────────────────
>>
>> julia> m02 = let f = @formula y ~ 1 + x1 + x2 + x3 + (1|id)
>> fit(MixedModel, f, dat, Bernoulli(); contrasts, nAGQ=9)
>> end
>> Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 9)
>> y ~ 1 + x1 + x2 + x3 + (1 | id)
>> Distribution: Bernoulli{Float64}
>> Link: LogitLink()
>>
>> logLik deviance AIC AICc BIC
>> -1472.4551 2944.0654 2954.9102 2954.9373 2983.4297
>>
>> Variance components:
>> Column Variance Std.Dev.
>> id (Intercept) 0.274331 0.523766
>>
>> Number of obs: 2217; levels of grouping factors: 331
>>
>> Fixed-effects parameters:
>> ────────────────────────────────────────────────────
>> Coef. Std. Error z Pr(>|z|)
>> ────────────────────────────────────────────────────
>> (Intercept) -0.448496 0.100915 -4.44 <1e-05
>> x1: 1 -0.0527684 0.0547746 -0.96 0.3354
>> x2 0.0694393 0.0131541 5.28 <1e-06
>> x3 -0.556903 0.0904798 -6.15 <1e-09
>> ────────────────────────────────────────────────────
>>
>> julia> MixedModels.likelihoodratiotest(m02, m01)
>> Model Formulae
>> 1: y ~ 1 + x1 + x2 + x3 + (1 | id)
>> 2: y ~ 1 + x1 + x2 + x3 + x4 + (1 | id)
>> ──────────────────────────────────────────────────
>> model-dof deviance χ² χ²-dof P(>χ²)
>> ──────────────────────────────────────────────────
>> [1] 5 2944.0654
>> [2] 7 2900.8511 43.2144 2 <1e-09
>> ──────────────────────────────────────────────────
>>
>> I would note that your data are so imbalanced with respect to id that it
>> is not surprising that you get unstable results. (I changed your id column
>> from integers to a factor so that 33 becomes S033.)
>>
>> 331×2 DataFrame
>> Row │ id nrow
>> │ String Int64
>> ─────┼───────────────
>> 1 │ S033 1227
>> 2 │ S134 46
>> 3 │ S295 45
>> 4 │ S127 41
>> 5 │ S125 33
>> 6 │ S228 31
>> 7 │ S193 23
>> 8 │ S064 18
>> 9 │ S281 16
>> 10 │ S055 13
>> 11 │ S035 13
>> 12 │ S091 12
>> 13 │ S175 11
>> 14 │ S284 10
>> 15 │ S159 10
>> ⋮ │ ⋮ ⋮
>> 317 │ S324 1
>> 318 │ S115 1
>> 319 │ S192 1
>> 320 │ S201 1
>> 321 │ S156 1
>> 322 │ S202 1
>> 323 │ S067 1
>> 324 │ S264 1
>> 325 │ S023 1
>> 326 │ S090 1
>> 327 │ S195 1
>> 328 │ S170 1
>> 329 │ S241 1
>> 330 │ S189 1
>> 331 │ S213 1
>> 301 rows omitted
>>
>>
>>
>> On Thu, Aug 31, 2023 at 3:02 AM Juho Kristian Ruohonen <
>> juho.kristian.ruohonen using gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I thought it was impossible for deviance to decrease when a term is
>>> removed!?!? Yet I'm seeing it happen with this pair of relatively simple
>>> Bernoulli GLMMs fit using lme4:glmer():
>>>
>>> > full <- glmer(y ~ (1|id) + x1 + x2 + x3 + x4, family = binomial, nAGQ =
>>> > 6, data = anon)
>>>
>>> > reduced <- update(full, ~. -x1)
>>>
>>> > c(full = deviance(full), reduced = deviance(reduced))
>>>
>>>
>>> * full reduced*
>>> *2808.671 2807.374 *
>>>
>>> What on earth going on? FYI, I am deliberately comparing conditional
>>> deviances rather than marginal ones, because quite a few of the clusters
>>> are of inherent interest and likely to recur in future data.
>>>
>>> My anonymized datafile is attached.
>>>
>>> Best,
>>>
>>> Juho
>>> _______________________________________________
>>> R-sig-mixed-models using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list