[R-sig-ME] glmer conditional deviance DECREASING after removal of fixed effect??

Thu Aug 31 19:42:41 CEST 2023

   For what it's worth we did attempt to document what we knew about 
deviance here:

https://github.com/lme4/lme4/blob/master/man/merMod-class.Rd
(formatted version at  https://rdrr.io/cran/lme4/man/merMod-class.html)

in particular the note that says

\item If adaptive Gauss-Hermite quadrature is used, then
     \code{logLik(object)} is currently only proportional to the
     absolute-unconditional log-likelihood.

   To make things worth, this is about "absolute" vs "relative" and 
"conditional" vs "unconditional", "marginal" doesn't even come into it ...

   I will take a look at the example myself if I get a chance, as I find 
the result surprising too ...

   cheers
     Ben Bolker

On 2023-08-31 10:08 a.m., Phillip Alday wrote:
> I think this highlights the intersection of two big stumbling blocks:
> 
> 1. a lot of identities / niceness in classical GLM theory doesn't hold 
> in the mixed models case. See also coefficient of determination, clear 
> notions of denominator degrees of freedom for F tests, etc.
> 
> 2. a lot of R and S functionality around GLMs is based on convenient 
> computational quantities in classical GLM theory. This leads to some 
> infelicity in naming, which is further confused by (1) for mixed models.
> 
> I think this highlights a real challenge to many users, also because of 
> the way we teach statistics. (I'm teaching a course soon, can everyone 
> tell?) Many intro statistics courses state things like "the coefficient 
> of determination, R^2, is the squared Pearson correlation, r, and can be 
> interpreted as ....". But we don't highlight that that's an identity for 
> simple linear regression and not the formal definition of the 
> coefficient of determination. We also don't emphasize how many of these 
> definitions hinge on concepts that don't have an obvious extension to 
> multi-level models or perhaps none of the obvious extensions hold all 
> the properties of the classical GLM form. (Most of this is discussed in 
> the GLMM FAQ!)
> 
> This is a criticism of myself as an instructor as much as anything, but 
> it's something I like to emphasize in teaching nowadays and highlight as 
> a point 're-education' for students who've had a more traditional 
> education.
> 
> Phillip
> 
> On 8/31/23 08:45, Douglas Bates wrote:
>> I have written about how I now understand the evaluation of the 
>> likelihood
>> of a GLMM in an appendix of the in-progress book at
>> https://juliamixedmodels.github.io/EmbraceUncertainty
>>
>> It is a difficult area to make sense of.  Even now, after more than 25
>> years of thinking about these models, I am still not sure exactly how to
>> define the objective in a GLMM for a family with a dispersion parameter.
>>
>> On Thu, Aug 31, 2023 at 8:37 AM Douglas Bates <dmbates using gmail.com> wrote:
>>
>>> I may have attributed the confusion about conditional and marginal
>>> deviances to you, Juho, when in fact it is the fault of the authors of
>>> lme4, of whom I am one.  If so, I apologize.
>>>
>>> We (the authors of lme4) had a discussion about the definition of 
>>> deviance
>>> many years ago and I am not sure what the upshot was.  It may have been
>>> that we went with this incorrect definition of the "conditional 
>>> deviance".
>>> Unfortunately my R skills are sufficiently atrophied that I haven't been
>>> able to look up what is actually evaluated.
>>>
>>> The area of generalized linear models and generalized linear mixed 
>>> models
>>> holds many traps for the unwary in terms of definitions, and some casual
>>> definitions in the original glm function, going as far back as the S3
>>> language, aid in the confusion.  For example, the dev.resid function 
>>> in a
>>> GLM family doesn't return the deviance residuals - it returns the 
>>> square of
>>> the deviance residuals, which should be named the "unit deviances".
>>>
>>> Given a response vector and the predicted mean vector and a distribution
>>> family one can define the unit deviances, which play a role similar 
>>> to the
>>> squared residual in the Gaussian family.  The sum of these unit 
>>> deviances
>>> is the deviance of a generalized linear model, but it is not the 
>>> deviance
>>> of a generalized linear mixed model, because the likelihood for a GLMM
>>> involves both a "fidelity to the data" term *and* a "complexity of the
>>> model" term, which is the size of the random effects vector in a certain
>>> metric.
>>>
>>> So if indeed we used an inappropriate definition of deviance the mistake
>>> is ours and we should correct it.
>>>
>>> On Thu, Aug 31, 2023 at 7:56 AM Douglas Bates <dmbates using gmail.com> wrote:
>>>
>>>> You say you are comparing conditional deviances rather than marginal
>>>> deviances.  Can you expand on that a bit further?  How are you 
>>>> evaluating
>>>> the conditional deviances?
>>>>
>>>> The models are being fit according to what you describe as the marginal
>>>> deviance - what I would call the deviance.  It is not surprising 
>>>> that what
>>>> you are calling the conditional deviance is inconsistent with the 
>>>> nesting
>>>> of the models because they weren't fit according to that criterion.
>>>>
>>>> julia> m01 = let f = @formula y ~ 1 + x1 + x2 + x3 + x4 + (1|id)
>>>>             fit(MixedModel, f, dat, Bernoulli(); contrasts, nAGQ=9)
>>>>         end
>>>> Minimizing 141   Time: 0:00:00 ( 1.22 ms/it)
>>>> Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 9)
>>>>    y ~ 1 + x1 + x2 + x3 + x4 + (1 | id)
>>>>    Distribution: Bernoulli{Float64}
>>>>    Link: LogitLink()
>>>>
>>>>     logLik    deviance     AIC       AICc        BIC
>>>>   -1450.8163  2900.8511  2915.6326  2915.6833  2955.5600
>>>>
>>>> Variance components:
>>>>        Column   Variance Std.Dev.
>>>> id (Intercept)  0.270823 0.520406
>>>>
>>>>   Number of obs: 2217; levels of grouping factors: 331
>>>>
>>>> Fixed-effects parameters:
>>>> ────────────────────────────────────────────────────
>>>>                    Coef.  Std. Error      z  Pr(>|z|)
>>>> ────────────────────────────────────────────────────
>>>> (Intercept)   0.0969256   0.140211    0.69    0.4894
>>>> x1: 1        -0.0449824   0.0553361  -0.81    0.4163
>>>> x2            0.0744891   0.0133743   5.57    <1e-07
>>>> x3           -0.548392    0.0914109  -6.00    <1e-08
>>>> x4: B         0.390359    0.0803063   4.86    <1e-05
>>>> x4: C         0.299932    0.0991249   3.03    0.0025
>>>> ────────────────────────────────────────────────────
>>>>
>>>> julia> m02 = let f = @formula y ~ 1 + x1 + x2 + x3 + (1|id)
>>>>             fit(MixedModel, f, dat, Bernoulli(); contrasts, nAGQ=9)
>>>>         end
>>>> Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 9)
>>>>    y ~ 1 + x1 + x2 + x3 + (1 | id)
>>>>    Distribution: Bernoulli{Float64}
>>>>    Link: LogitLink()
>>>>
>>>>     logLik    deviance     AIC       AICc        BIC
>>>>   -1472.4551  2944.0654  2954.9102  2954.9373  2983.4297
>>>>
>>>> Variance components:
>>>>        Column   Variance Std.Dev.
>>>> id (Intercept)  0.274331 0.523766
>>>>
>>>>   Number of obs: 2217; levels of grouping factors: 331
>>>>
>>>> Fixed-effects parameters:
>>>> ────────────────────────────────────────────────────
>>>>                    Coef.  Std. Error      z  Pr(>|z|)
>>>> ────────────────────────────────────────────────────
>>>> (Intercept)  -0.448496    0.100915   -4.44    <1e-05
>>>> x1: 1        -0.0527684   0.0547746  -0.96    0.3354
>>>> x2            0.0694393   0.0131541   5.28    <1e-06
>>>> x3           -0.556903    0.0904798  -6.15    <1e-09
>>>> ────────────────────────────────────────────────────
>>>>
>>>> julia> MixedModels.likelihoodratiotest(m02, m01)
>>>> Model Formulae
>>>> 1: y ~ 1 + x1 + x2 + x3 + (1 | id)
>>>> 2: y ~ 1 + x1 + x2 + x3 + x4 + (1 | id)
>>>> ──────────────────────────────────────────────────
>>>>       model-dof   deviance       χ²  χ²-dof  P(>χ²)
>>>> ──────────────────────────────────────────────────
>>>> [1]          5  2944.0654
>>>> [2]          7  2900.8511  43.2144       2  <1e-09
>>>> ──────────────────────────────────────────────────
>>>>
>>>> I would note that your data are so imbalanced with respect to id 
>>>> that it
>>>> is not surprising that you get unstable results.  (I changed your id 
>>>> column
>>>> from integers to a factor so that 33 becomes S033.)
>>>>
>>>> 331×2 DataFrame
>>>>   Row │ id      nrow
>>>>       │ String  Int64
>>>> ─────┼───────────────
>>>>     1 │ S033     1227
>>>>     2 │ S134       46
>>>>     3 │ S295       45
>>>>     4 │ S127       41
>>>>     5 │ S125       33
>>>>     6 │ S228       31
>>>>     7 │ S193       23
>>>>     8 │ S064       18
>>>>     9 │ S281       16
>>>>    10 │ S055       13
>>>>    11 │ S035       13
>>>>    12 │ S091       12
>>>>    13 │ S175       11
>>>>    14 │ S284       10
>>>>    15 │ S159       10
>>>>    ⋮  │   ⋮       ⋮
>>>>   317 │ S324        1
>>>>   318 │ S115        1
>>>>   319 │ S192        1
>>>>   320 │ S201        1
>>>>   321 │ S156        1
>>>>   322 │ S202        1
>>>>   323 │ S067        1
>>>>   324 │ S264        1
>>>>   325 │ S023        1
>>>>   326 │ S090        1
>>>>   327 │ S195        1
>>>>   328 │ S170        1
>>>>   329 │ S241        1
>>>>   330 │ S189        1
>>>>   331 │ S213        1
>>>>       301 rows omitted
>>>>
>>>>
>>>>
>>>> On Thu, Aug 31, 2023 at 3:02 AM Juho Kristian Ruohonen <
>>>> juho.kristian.ruohonen using gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I thought it was impossible for deviance to decrease when a term is
>>>>> removed!?!? Yet I'm seeing it happen with this pair of relatively 
>>>>> simple
>>>>> Bernoulli GLMMs fit using lme4:glmer():
>>>>>
>>>>>> full <- glmer(y ~ (1|id) + x1 + x2 + x3 + x4, family = binomial, 
>>>>>> nAGQ =
>>>>>> 6, data = anon)
>>>>>> reduced <- update(full, ~. -x1)
>>>>>> c(full = deviance(full), reduced = deviance(reduced))
>>>>>
>>>>> *     full  reduced*
>>>>> *2808.671 2807.374 *
>>>>>
>>>>> What on earth going on? FYI, I am deliberately comparing conditional
>>>>> deviances rather than marginal ones, because quite a few of the 
>>>>> clusters
>>>>> are of inherent interest and likely to recur in future data.
>>>>>
>>>>> My anonymized datafile is attached.
>>>>>
>>>>> Best,
>>>>>
>>>>> Juho
>>>>> _______________________________________________
>>>>> R-sig-mixed-models using r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>
>>     [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models