[R-sig-ME] glmer conditional deviance DECREASING after removal of fixed effect??

Thu Aug 31 15:37:22 CEST 2023

I may have attributed the confusion about conditional and marginal
deviances to you, Juho, when in fact it is the fault of the authors of
lme4, of whom I am one.  If so, I apologize.

We (the authors of lme4) had a discussion about the definition of deviance
many years ago and I am not sure what the upshot was.  It may have been
that we went with this incorrect definition of the "conditional deviance".
Unfortunately my R skills are sufficiently atrophied that I haven't been
able to look up what is actually evaluated.

The area of generalized linear models and generalized linear mixed models
holds many traps for the unwary in terms of definitions, and some casual
definitions in the original glm function, going as far back as the S3
language, aid in the confusion.  For example, the dev.resid function in a
GLM family doesn't return the deviance residuals - it returns the square of
the deviance residuals, which should be named the "unit deviances".

Given a response vector and the predicted mean vector and a distribution
family one can define the unit deviances, which play a role similar to the
squared residual in the Gaussian family.  The sum of these unit deviances
is the deviance of a generalized linear model, but it is not the deviance
of a generalized linear mixed model, because the likelihood for a GLMM
involves both a "fidelity to the data" term *and* a "complexity of the
model" term, which is the size of the random effects vector in a certain
metric.

So if indeed we used an inappropriate definition of deviance the mistake is
ours and we should correct it.

On Thu, Aug 31, 2023 at 7:56 AM Douglas Bates <dmbates using gmail.com> wrote:

> You say you are comparing conditional deviances rather than marginal
> deviances.  Can you expand on that a bit further?  How are you evaluating
> the conditional deviances?
>
> The models are being fit according to what you describe as the marginal
> deviance - what I would call the deviance.  It is not surprising that what
> you are calling the conditional deviance is inconsistent with the nesting
> of the models because they weren't fit according to that criterion.
>
> julia> m01 = let f = @formula y ~ 1 + x1 + x2 + x3 + x4 + (1|id)
>            fit(MixedModel, f, dat, Bernoulli(); contrasts, nAGQ=9)
>        end
> Minimizing 141   Time: 0:00:00 ( 1.22 ms/it)
> Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 9)
>   y ~ 1 + x1 + x2 + x3 + x4 + (1 | id)
>   Distribution: Bernoulli{Float64}
>   Link: LogitLink()
>
>    logLik    deviance     AIC       AICc        BIC
>  -1450.8163  2900.8511  2915.6326  2915.6833  2955.5600
>
> Variance components:
>       Column   Variance Std.Dev.
> id (Intercept)  0.270823 0.520406
>
>  Number of obs: 2217; levels of grouping factors: 331
>
> Fixed-effects parameters:
> ────────────────────────────────────────────────────
>                   Coef.  Std. Error      z  Pr(>|z|)
> ────────────────────────────────────────────────────
> (Intercept)   0.0969256   0.140211    0.69    0.4894
> x1: 1        -0.0449824   0.0553361  -0.81    0.4163
> x2            0.0744891   0.0133743   5.57    <1e-07
> x3           -0.548392    0.0914109  -6.00    <1e-08
> x4: B         0.390359    0.0803063   4.86    <1e-05
> x4: C         0.299932    0.0991249   3.03    0.0025
> ────────────────────────────────────────────────────
>
> julia> m02 = let f = @formula y ~ 1 + x1 + x2 + x3 + (1|id)
>            fit(MixedModel, f, dat, Bernoulli(); contrasts, nAGQ=9)
>        end
> Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 9)
>   y ~ 1 + x1 + x2 + x3 + (1 | id)
>   Distribution: Bernoulli{Float64}
>   Link: LogitLink()
>
>    logLik    deviance     AIC       AICc        BIC
>  -1472.4551  2944.0654  2954.9102  2954.9373  2983.4297
>
> Variance components:
>       Column   Variance Std.Dev.
> id (Intercept)  0.274331 0.523766
>
>  Number of obs: 2217; levels of grouping factors: 331
>
> Fixed-effects parameters:
> ────────────────────────────────────────────────────
>                   Coef.  Std. Error      z  Pr(>|z|)
> ────────────────────────────────────────────────────
> (Intercept)  -0.448496    0.100915   -4.44    <1e-05
> x1: 1        -0.0527684   0.0547746  -0.96    0.3354
> x2            0.0694393   0.0131541   5.28    <1e-06
> x3           -0.556903    0.0904798  -6.15    <1e-09
> ────────────────────────────────────────────────────
>
> julia> MixedModels.likelihoodratiotest(m02, m01)
> Model Formulae
> 1: y ~ 1 + x1 + x2 + x3 + (1 | id)
> 2: y ~ 1 + x1 + x2 + x3 + x4 + (1 | id)
> ──────────────────────────────────────────────────
>      model-dof   deviance       χ²  χ²-dof  P(>χ²)
> ──────────────────────────────────────────────────
> [1]          5  2944.0654
> [2]          7  2900.8511  43.2144       2  <1e-09
> ──────────────────────────────────────────────────
>
> I would note that your data are so imbalanced with respect to id that it
> is not surprising that you get unstable results.  (I changed your id column
> from integers to a factor so that 33 becomes S033.)
>
> 331×2 DataFrame
>  Row │ id      nrow
>      │ String  Int64
> ─────┼───────────────
>    1 │ S033     1227
>    2 │ S134       46
>    3 │ S295       45
>    4 │ S127       41
>    5 │ S125       33
>    6 │ S228       31
>    7 │ S193       23
>    8 │ S064       18
>    9 │ S281       16
>   10 │ S055       13
>   11 │ S035       13
>   12 │ S091       12
>   13 │ S175       11
>   14 │ S284       10
>   15 │ S159       10
>   ⋮  │   ⋮       ⋮
>  317 │ S324        1
>  318 │ S115        1
>  319 │ S192        1
>  320 │ S201        1
>  321 │ S156        1
>  322 │ S202        1
>  323 │ S067        1
>  324 │ S264        1
>  325 │ S023        1
>  326 │ S090        1
>  327 │ S195        1
>  328 │ S170        1
>  329 │ S241        1
>  330 │ S189        1
>  331 │ S213        1
>      301 rows omitted
>
>
>
> On Thu, Aug 31, 2023 at 3:02 AM Juho Kristian Ruohonen <
> juho.kristian.ruohonen using gmail.com> wrote:
>
>> Hi,
>>
>> I thought it was impossible for deviance to decrease when a term is
>> removed!?!? Yet I'm seeing it happen with this pair of relatively simple
>> Bernoulli GLMMs fit using lme4:glmer():
>>
>> > full <- glmer(y ~ (1|id) + x1 + x2 + x3 + x4, family = binomial, nAGQ =
>> > 6, data = anon)
>>
>> > reduced <- update(full, ~. -x1)
>>
>> > c(full = deviance(full), reduced = deviance(reduced))
>>
>>
>> *     full  reduced*
>> *2808.671 2807.374 *
>>
>> What on earth going on? FYI, I am deliberately comparing conditional
>> deviances rather than marginal ones, because quite a few of the clusters
>> are of inherent interest and likely to recur in future data.
>>
>> My anonymized datafile is attached.
>>
>> Best,
>>
>> Juho
>> _______________________________________________
>> R-sig-mixed-models using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>

	[[alternative HTML version deleted]]