[R-sig-ME] glmer conditional deviance DECREASING after removal of fixed effect??
Ben Bolker
bbo|ker @end|ng |rom gm@||@com
Thu Aug 31 19:42:41 CEST 2023
For what it's worth we did attempt to document what we knew about
deviance here:
https://github.com/lme4/lme4/blob/master/man/merMod-class.Rd
(formatted version at https://rdrr.io/cran/lme4/man/merMod-class.html)
in particular the note that says
\item If adaptive Gauss-Hermite quadrature is used, then
\code{logLik(object)} is currently only proportional to the
absolute-unconditional log-likelihood.
To make things worth, this is about "absolute" vs "relative" and
"conditional" vs "unconditional", "marginal" doesn't even come into it ...
I will take a look at the example myself if I get a chance, as I find
the result surprising too ...
cheers
Ben Bolker
On 2023-08-31 10:08 a.m., Phillip Alday wrote:
> I think this highlights the intersection of two big stumbling blocks:
>
> 1. a lot of identities / niceness in classical GLM theory doesn't hold
> in the mixed models case. See also coefficient of determination, clear
> notions of denominator degrees of freedom for F tests, etc.
>
> 2. a lot of R and S functionality around GLMs is based on convenient
> computational quantities in classical GLM theory. This leads to some
> infelicity in naming, which is further confused by (1) for mixed models.
>
> I think this highlights a real challenge to many users, also because of
> the way we teach statistics. (I'm teaching a course soon, can everyone
> tell?) Many intro statistics courses state things like "the coefficient
> of determination, R^2, is the squared Pearson correlation, r, and can be
> interpreted as ....". But we don't highlight that that's an identity for
> simple linear regression and not the formal definition of the
> coefficient of determination. We also don't emphasize how many of these
> definitions hinge on concepts that don't have an obvious extension to
> multi-level models or perhaps none of the obvious extensions hold all
> the properties of the classical GLM form. (Most of this is discussed in
> the GLMM FAQ!)
>
> This is a criticism of myself as an instructor as much as anything, but
> it's something I like to emphasize in teaching nowadays and highlight as
> a point 're-education' for students who've had a more traditional
> education.
>
> Phillip
>
> On 8/31/23 08:45, Douglas Bates wrote:
>> I have written about how I now understand the evaluation of the
>> likelihood
>> of a GLMM in an appendix of the in-progress book at
>> https://juliamixedmodels.github.io/EmbraceUncertainty
>>
>> It is a difficult area to make sense of. Even now, after more than 25
>> years of thinking about these models, I am still not sure exactly how to
>> define the objective in a GLMM for a family with a dispersion parameter.
>>
>> On Thu, Aug 31, 2023 at 8:37 AM Douglas Bates <dmbates using gmail.com> wrote:
>>
>>> I may have attributed the confusion about conditional and marginal
>>> deviances to you, Juho, when in fact it is the fault of the authors of
>>> lme4, of whom I am one. If so, I apologize.
>>>
>>> We (the authors of lme4) had a discussion about the definition of
>>> deviance
>>> many years ago and I am not sure what the upshot was. It may have been
>>> that we went with this incorrect definition of the "conditional
>>> deviance".
>>> Unfortunately my R skills are sufficiently atrophied that I haven't been
>>> able to look up what is actually evaluated.
>>>
>>> The area of generalized linear models and generalized linear mixed
>>> models
>>> holds many traps for the unwary in terms of definitions, and some casual
>>> definitions in the original glm function, going as far back as the S3
>>> language, aid in the confusion. For example, the dev.resid function
>>> in a
>>> GLM family doesn't return the deviance residuals - it returns the
>>> square of
>>> the deviance residuals, which should be named the "unit deviances".
>>>
>>> Given a response vector and the predicted mean vector and a distribution
>>> family one can define the unit deviances, which play a role similar
>>> to the
>>> squared residual in the Gaussian family. The sum of these unit
>>> deviances
>>> is the deviance of a generalized linear model, but it is not the
>>> deviance
>>> of a generalized linear mixed model, because the likelihood for a GLMM
>>> involves both a "fidelity to the data" term *and* a "complexity of the
>>> model" term, which is the size of the random effects vector in a certain
>>> metric.
>>>
>>> So if indeed we used an inappropriate definition of deviance the mistake
>>> is ours and we should correct it.
>>>
>>> On Thu, Aug 31, 2023 at 7:56 AM Douglas Bates <dmbates using gmail.com> wrote:
>>>
>>>> You say you are comparing conditional deviances rather than marginal
>>>> deviances. Can you expand on that a bit further? How are you
>>>> evaluating
>>>> the conditional deviances?
>>>>
>>>> The models are being fit according to what you describe as the marginal
>>>> deviance - what I would call the deviance. It is not surprising
>>>> that what
>>>> you are calling the conditional deviance is inconsistent with the
>>>> nesting
>>>> of the models because they weren't fit according to that criterion.
>>>>
>>>> julia> m01 = let f = @formula y ~ 1 + x1 + x2 + x3 + x4 + (1|id)
>>>> fit(MixedModel, f, dat, Bernoulli(); contrasts, nAGQ=9)
>>>> end
>>>> Minimizing 141 Time: 0:00:00 ( 1.22 ms/it)
>>>> Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 9)
>>>> y ~ 1 + x1 + x2 + x3 + x4 + (1 | id)
>>>> Distribution: Bernoulli{Float64}
>>>> Link: LogitLink()
>>>>
>>>> logLik deviance AIC AICc BIC
>>>> -1450.8163 2900.8511 2915.6326 2915.6833 2955.5600
>>>>
>>>> Variance components:
>>>> Column Variance Std.Dev.
>>>> id (Intercept) 0.270823 0.520406
>>>>
>>>> Number of obs: 2217; levels of grouping factors: 331
>>>>
>>>> Fixed-effects parameters:
>>>> ────────────────────────────────────────────────────
>>>> Coef. Std. Error z Pr(>|z|)
>>>> ────────────────────────────────────────────────────
>>>> (Intercept) 0.0969256 0.140211 0.69 0.4894
>>>> x1: 1 -0.0449824 0.0553361 -0.81 0.4163
>>>> x2 0.0744891 0.0133743 5.57 <1e-07
>>>> x3 -0.548392 0.0914109 -6.00 <1e-08
>>>> x4: B 0.390359 0.0803063 4.86 <1e-05
>>>> x4: C 0.299932 0.0991249 3.03 0.0025
>>>> ────────────────────────────────────────────────────
>>>>
>>>> julia> m02 = let f = @formula y ~ 1 + x1 + x2 + x3 + (1|id)
>>>> fit(MixedModel, f, dat, Bernoulli(); contrasts, nAGQ=9)
>>>> end
>>>> Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 9)
>>>> y ~ 1 + x1 + x2 + x3 + (1 | id)
>>>> Distribution: Bernoulli{Float64}
>>>> Link: LogitLink()
>>>>
>>>> logLik deviance AIC AICc BIC
>>>> -1472.4551 2944.0654 2954.9102 2954.9373 2983.4297
>>>>
>>>> Variance components:
>>>> Column Variance Std.Dev.
>>>> id (Intercept) 0.274331 0.523766
>>>>
>>>> Number of obs: 2217; levels of grouping factors: 331
>>>>
>>>> Fixed-effects parameters:
>>>> ────────────────────────────────────────────────────
>>>> Coef. Std. Error z Pr(>|z|)
>>>> ────────────────────────────────────────────────────
>>>> (Intercept) -0.448496 0.100915 -4.44 <1e-05
>>>> x1: 1 -0.0527684 0.0547746 -0.96 0.3354
>>>> x2 0.0694393 0.0131541 5.28 <1e-06
>>>> x3 -0.556903 0.0904798 -6.15 <1e-09
>>>> ────────────────────────────────────────────────────
>>>>
>>>> julia> MixedModels.likelihoodratiotest(m02, m01)
>>>> Model Formulae
>>>> 1: y ~ 1 + x1 + x2 + x3 + (1 | id)
>>>> 2: y ~ 1 + x1 + x2 + x3 + x4 + (1 | id)
>>>> ──────────────────────────────────────────────────
>>>> model-dof deviance χ² χ²-dof P(>χ²)
>>>> ──────────────────────────────────────────────────
>>>> [1] 5 2944.0654
>>>> [2] 7 2900.8511 43.2144 2 <1e-09
>>>> ──────────────────────────────────────────────────
>>>>
>>>> I would note that your data are so imbalanced with respect to id
>>>> that it
>>>> is not surprising that you get unstable results. (I changed your id
>>>> column
>>>> from integers to a factor so that 33 becomes S033.)
>>>>
>>>> 331×2 DataFrame
>>>> Row │ id nrow
>>>> │ String Int64
>>>> ─────┼───────────────
>>>> 1 │ S033 1227
>>>> 2 │ S134 46
>>>> 3 │ S295 45
>>>> 4 │ S127 41
>>>> 5 │ S125 33
>>>> 6 │ S228 31
>>>> 7 │ S193 23
>>>> 8 │ S064 18
>>>> 9 │ S281 16
>>>> 10 │ S055 13
>>>> 11 │ S035 13
>>>> 12 │ S091 12
>>>> 13 │ S175 11
>>>> 14 │ S284 10
>>>> 15 │ S159 10
>>>> ⋮ │ ⋮ ⋮
>>>> 317 │ S324 1
>>>> 318 │ S115 1
>>>> 319 │ S192 1
>>>> 320 │ S201 1
>>>> 321 │ S156 1
>>>> 322 │ S202 1
>>>> 323 │ S067 1
>>>> 324 │ S264 1
>>>> 325 │ S023 1
>>>> 326 │ S090 1
>>>> 327 │ S195 1
>>>> 328 │ S170 1
>>>> 329 │ S241 1
>>>> 330 │ S189 1
>>>> 331 │ S213 1
>>>> 301 rows omitted
>>>>
>>>>
>>>>
>>>> On Thu, Aug 31, 2023 at 3:02 AM Juho Kristian Ruohonen <
>>>> juho.kristian.ruohonen using gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I thought it was impossible for deviance to decrease when a term is
>>>>> removed!?!? Yet I'm seeing it happen with this pair of relatively
>>>>> simple
>>>>> Bernoulli GLMMs fit using lme4:glmer():
>>>>>
>>>>>> full <- glmer(y ~ (1|id) + x1 + x2 + x3 + x4, family = binomial,
>>>>>> nAGQ =
>>>>>> 6, data = anon)
>>>>>> reduced <- update(full, ~. -x1)
>>>>>> c(full = deviance(full), reduced = deviance(reduced))
>>>>>
>>>>> * full reduced*
>>>>> *2808.671 2807.374 *
>>>>>
>>>>> What on earth going on? FYI, I am deliberately comparing conditional
>>>>> deviances rather than marginal ones, because quite a few of the
>>>>> clusters
>>>>> are of inherent interest and likely to recur in future data.
>>>>>
>>>>> My anonymized datafile is attached.
>>>>>
>>>>> Best,
>>>>>
>>>>> Juho
>>>>> _______________________________________________
>>>>> R-sig-mixed-models using r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
More information about the R-sig-mixed-models
mailing list