[R-sig-ME] glmer conditional deviance DECREASING after removal of fixed effect??

Thu Aug 31 21:38:51 CEST 2023

...and here I was thinking "unconditional likelihood" was just another
synonym for the "marginal" or "average" likelihood I see discussed in stats
books...

Oh well, Douglas has succinctly solved my main problem. Any further
insights are just icing on the cake.

Thanks again to all,

J

to 31. elok. 2023 klo 20.43 Ben Bolker (bbolker using gmail.com) kirjoitti:

>    For what it's worth we did attempt to document what we knew about
> deviance here:
>
> https://github.com/lme4/lme4/blob/master/man/merMod-class.Rd
> (formatted version at  https://rdrr.io/cran/lme4/man/merMod-class.html)
>
> in particular the note that says
>
> \item If adaptive Gauss-Hermite quadrature is used, then
>      \code{logLik(object)} is currently only proportional to the
>      absolute-unconditional log-likelihood.
>
>    To make things worth, this is about "absolute" vs "relative" and
> "conditional" vs "unconditional", "marginal" doesn't even come into it ...
>
>    I will take a look at the example myself if I get a chance, as I find
> the result surprising too ...
>
>    cheers
>      Ben Bolker
>
>
>
> On 2023-08-31 10:08 a.m., Phillip Alday wrote:
> > I think this highlights the intersection of two big stumbling blocks:
> >
> > 1. a lot of identities / niceness in classical GLM theory doesn't hold
> > in the mixed models case. See also coefficient of determination, clear
> > notions of denominator degrees of freedom for F tests, etc.
> >
> > 2. a lot of R and S functionality around GLMs is based on convenient
> > computational quantities in classical GLM theory. This leads to some
> > infelicity in naming, which is further confused by (1) for mixed models.
> >
> > I think this highlights a real challenge to many users, also because of
> > the way we teach statistics. (I'm teaching a course soon, can everyone
> > tell?) Many intro statistics courses state things like "the coefficient
> > of determination, R^2, is the squared Pearson correlation, r, and can be
> > interpreted as ....". But we don't highlight that that's an identity for
> > simple linear regression and not the formal definition of the
> > coefficient of determination. We also don't emphasize how many of these
> > definitions hinge on concepts that don't have an obvious extension to
> > multi-level models or perhaps none of the obvious extensions hold all
> > the properties of the classical GLM form. (Most of this is discussed in
> > the GLMM FAQ!)
> >
> > This is a criticism of myself as an instructor as much as anything, but
> > it's something I like to emphasize in teaching nowadays and highlight as
> > a point 're-education' for students who've had a more traditional
> > education.
> >
> > Phillip
> >
> > On 8/31/23 08:45, Douglas Bates wrote:
> >> I have written about how I now understand the evaluation of the
> >> likelihood
> >> of a GLMM in an appendix of the in-progress book at
> >> https://juliamixedmodels.github.io/EmbraceUncertainty
> >>
> >> It is a difficult area to make sense of.  Even now, after more than 25
> >> years of thinking about these models, I am still not sure exactly how to
> >> define the objective in a GLMM for a family with a dispersion parameter.
> >>
> >> On Thu, Aug 31, 2023 at 8:37 AM Douglas Bates <dmbates using gmail.com>
> wrote:
> >>
> >>> I may have attributed the confusion about conditional and marginal
> >>> deviances to you, Juho, when in fact it is the fault of the authors of
> >>> lme4, of whom I am one.  If so, I apologize.
> >>>
> >>> We (the authors of lme4) had a discussion about the definition of
> >>> deviance
> >>> many years ago and I am not sure what the upshot was.  It may have been
> >>> that we went with this incorrect definition of the "conditional
> >>> deviance".
> >>> Unfortunately my R skills are sufficiently atrophied that I haven't
> been
> >>> able to look up what is actually evaluated.
> >>>
> >>> The area of generalized linear models and generalized linear mixed
> >>> models
> >>> holds many traps for the unwary in terms of definitions, and some
> casual
> >>> definitions in the original glm function, going as far back as the S3
> >>> language, aid in the confusion.  For example, the dev.resid function
> >>> in a
> >>> GLM family doesn't return the deviance residuals - it returns the
> >>> square of
> >>> the deviance residuals, which should be named the "unit deviances".
> >>>
> >>> Given a response vector and the predicted mean vector and a
> distribution
> >>> family one can define the unit deviances, which play a role similar
> >>> to the
> >>> squared residual in the Gaussian family.  The sum of these unit
> >>> deviances
> >>> is the deviance of a generalized linear model, but it is not the
> >>> deviance
> >>> of a generalized linear mixed model, because the likelihood for a GLMM
> >>> involves both a "fidelity to the data" term *and* a "complexity of the
> >>> model" term, which is the size of the random effects vector in a
> certain
> >>> metric.
> >>>
> >>> So if indeed we used an inappropriate definition of deviance the
> mistake
> >>> is ours and we should correct it.
> >>>
> >>> On Thu, Aug 31, 2023 at 7:56 AM Douglas Bates <dmbates using gmail.com>
> wrote:
> >>>
> >>>> You say you are comparing conditional deviances rather than marginal
> >>>> deviances.  Can you expand on that a bit further?  How are you
> >>>> evaluating
> >>>> the conditional deviances?
> >>>>
> >>>> The models are being fit according to what you describe as the
> marginal
> >>>> deviance - what I would call the deviance.  It is not surprising
> >>>> that what
> >>>> you are calling the conditional deviance is inconsistent with the
> >>>> nesting
> >>>> of the models because they weren't fit according to that criterion.
> >>>>
> >>>> julia> m01 = let f = @formula y ~ 1 + x1 + x2 + x3 + x4 + (1|id)
> >>>>             fit(MixedModel, f, dat, Bernoulli(); contrasts, nAGQ=9)
> >>>>         end
> >>>> Minimizing 141   Time: 0:00:00 ( 1.22 ms/it)
> >>>> Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 9)
> >>>>    y ~ 1 + x1 + x2 + x3 + x4 + (1 | id)
> >>>>    Distribution: Bernoulli{Float64}
> >>>>    Link: LogitLink()
> >>>>
> >>>>     logLik    deviance     AIC       AICc        BIC
> >>>>   -1450.8163  2900.8511  2915.6326  2915.6833  2955.5600
> >>>>
> >>>> Variance components:
> >>>>        Column   Variance Std.Dev.
> >>>> id (Intercept)  0.270823 0.520406
> >>>>
> >>>>   Number of obs: 2217; levels of grouping factors: 331
> >>>>
> >>>> Fixed-effects parameters:
> >>>> ────────────────────────────────────────────────────
> >>>>                    Coef.  Std. Error      z  Pr(>|z|)
> >>>> ────────────────────────────────────────────────────
> >>>> (Intercept)   0.0969256   0.140211    0.69    0.4894
> >>>> x1: 1        -0.0449824   0.0553361  -0.81    0.4163
> >>>> x2            0.0744891   0.0133743   5.57    <1e-07
> >>>> x3           -0.548392    0.0914109  -6.00    <1e-08
> >>>> x4: B         0.390359    0.0803063   4.86    <1e-05
> >>>> x4: C         0.299932    0.0991249   3.03    0.0025
> >>>> ────────────────────────────────────────────────────
> >>>>
> >>>> julia> m02 = let f = @formula y ~ 1 + x1 + x2 + x3 + (1|id)
> >>>>             fit(MixedModel, f, dat, Bernoulli(); contrasts, nAGQ=9)
> >>>>         end
> >>>> Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 9)
> >>>>    y ~ 1 + x1 + x2 + x3 + (1 | id)
> >>>>    Distribution: Bernoulli{Float64}
> >>>>    Link: LogitLink()
> >>>>
> >>>>     logLik    deviance     AIC       AICc        BIC
> >>>>   -1472.4551  2944.0654  2954.9102  2954.9373  2983.4297
> >>>>
> >>>> Variance components:
> >>>>        Column   Variance Std.Dev.
> >>>> id (Intercept)  0.274331 0.523766
> >>>>
> >>>>   Number of obs: 2217; levels of grouping factors: 331
> >>>>
> >>>> Fixed-effects parameters:
> >>>> ────────────────────────────────────────────────────
> >>>>                    Coef.  Std. Error      z  Pr(>|z|)
> >>>> ────────────────────────────────────────────────────
> >>>> (Intercept)  -0.448496    0.100915   -4.44    <1e-05
> >>>> x1: 1        -0.0527684   0.0547746  -0.96    0.3354
> >>>> x2            0.0694393   0.0131541   5.28    <1e-06
> >>>> x3           -0.556903    0.0904798  -6.15    <1e-09
> >>>> ────────────────────────────────────────────────────
> >>>>
> >>>> julia> MixedModels.likelihoodratiotest(m02, m01)
> >>>> Model Formulae
> >>>> 1: y ~ 1 + x1 + x2 + x3 + (1 | id)
> >>>> 2: y ~ 1 + x1 + x2 + x3 + x4 + (1 | id)
> >>>> ──────────────────────────────────────────────────
> >>>>       model-dof   deviance       χ²  χ²-dof  P(>χ²)
> >>>> ──────────────────────────────────────────────────
> >>>> [1]          5  2944.0654
> >>>> [2]          7  2900.8511  43.2144       2  <1e-09
> >>>> ──────────────────────────────────────────────────
> >>>>
> >>>> I would note that your data are so imbalanced with respect to id
> >>>> that it
> >>>> is not surprising that you get unstable results.  (I changed your id
> >>>> column
> >>>> from integers to a factor so that 33 becomes S033.)
> >>>>
> >>>> 331×2 DataFrame
> >>>>   Row │ id      nrow
> >>>>       │ String  Int64
> >>>> ─────┼───────────────
> >>>>     1 │ S033     1227
> >>>>     2 │ S134       46
> >>>>     3 │ S295       45
> >>>>     4 │ S127       41
> >>>>     5 │ S125       33
> >>>>     6 │ S228       31
> >>>>     7 │ S193       23
> >>>>     8 │ S064       18
> >>>>     9 │ S281       16
> >>>>    10 │ S055       13
> >>>>    11 │ S035       13
> >>>>    12 │ S091       12
> >>>>    13 │ S175       11
> >>>>    14 │ S284       10
> >>>>    15 │ S159       10
> >>>>    ⋮  │   ⋮       ⋮
> >>>>   317 │ S324        1
> >>>>   318 │ S115        1
> >>>>   319 │ S192        1
> >>>>   320 │ S201        1
> >>>>   321 │ S156        1
> >>>>   322 │ S202        1
> >>>>   323 │ S067        1
> >>>>   324 │ S264        1
> >>>>   325 │ S023        1
> >>>>   326 │ S090        1
> >>>>   327 │ S195        1
> >>>>   328 │ S170        1
> >>>>   329 │ S241        1
> >>>>   330 │ S189        1
> >>>>   331 │ S213        1
> >>>>       301 rows omitted
> >>>>
> >>>>
> >>>>
> >>>> On Thu, Aug 31, 2023 at 3:02 AM Juho Kristian Ruohonen <
> >>>> juho.kristian.ruohonen using gmail.com> wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> I thought it was impossible for deviance to decrease when a term is
> >>>>> removed!?!? Yet I'm seeing it happen with this pair of relatively
> >>>>> simple
> >>>>> Bernoulli GLMMs fit using lme4:glmer():
> >>>>>
> >>>>>> full <- glmer(y ~ (1|id) + x1 + x2 + x3 + x4, family = binomial,
> >>>>>> nAGQ =
> >>>>>> 6, data = anon)
> >>>>>> reduced <- update(full, ~. -x1)
> >>>>>> c(full = deviance(full), reduced = deviance(reduced))
> >>>>>
> >>>>> *     full  reduced*
> >>>>> *2808.671 2807.374 *
> >>>>>
> >>>>> What on earth going on? FYI, I am deliberately comparing conditional
> >>>>> deviances rather than marginal ones, because quite a few of the
> >>>>> clusters
> >>>>> are of inherent interest and likely to recur in future data.
> >>>>>
> >>>>> My anonymized datafile is attached.
> >>>>>
> >>>>> Best,
> >>>>>
> >>>>> Juho
> >>>>> _______________________________________________
> >>>>> R-sig-mixed-models using r-project.org mailing list
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>>>>
> >>     [[alternative HTML version deleted]]
> >>
> >> _______________________________________________
> >> R-sig-mixed-models using r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
> > _______________________________________________
> > R-sig-mixed-models using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]