[R-sig-ME] glmer conditional deviance DECREASING after removal of fixed effect??
Juho Kristian Ruohonen
juho@kr|@t|@n@ruohonen @end|ng |rom gm@||@com
Thu Aug 31 21:38:51 CEST 2023
...and here I was thinking "unconditional likelihood" was just another
synonym for the "marginal" or "average" likelihood I see discussed in stats
books...
Oh well, Douglas has succinctly solved my main problem. Any further
insights are just icing on the cake.
Thanks again to all,
J
to 31. elok. 2023 klo 20.43 Ben Bolker (bbolker using gmail.com) kirjoitti:
> For what it's worth we did attempt to document what we knew about
> deviance here:
>
> https://github.com/lme4/lme4/blob/master/man/merMod-class.Rd
> (formatted version at https://rdrr.io/cran/lme4/man/merMod-class.html)
>
> in particular the note that says
>
> \item If adaptive Gauss-Hermite quadrature is used, then
> \code{logLik(object)} is currently only proportional to the
> absolute-unconditional log-likelihood.
>
> To make things worth, this is about "absolute" vs "relative" and
> "conditional" vs "unconditional", "marginal" doesn't even come into it ...
>
> I will take a look at the example myself if I get a chance, as I find
> the result surprising too ...
>
> cheers
> Ben Bolker
>
>
>
> On 2023-08-31 10:08 a.m., Phillip Alday wrote:
> > I think this highlights the intersection of two big stumbling blocks:
> >
> > 1. a lot of identities / niceness in classical GLM theory doesn't hold
> > in the mixed models case. See also coefficient of determination, clear
> > notions of denominator degrees of freedom for F tests, etc.
> >
> > 2. a lot of R and S functionality around GLMs is based on convenient
> > computational quantities in classical GLM theory. This leads to some
> > infelicity in naming, which is further confused by (1) for mixed models.
> >
> > I think this highlights a real challenge to many users, also because of
> > the way we teach statistics. (I'm teaching a course soon, can everyone
> > tell?) Many intro statistics courses state things like "the coefficient
> > of determination, R^2, is the squared Pearson correlation, r, and can be
> > interpreted as ....". But we don't highlight that that's an identity for
> > simple linear regression and not the formal definition of the
> > coefficient of determination. We also don't emphasize how many of these
> > definitions hinge on concepts that don't have an obvious extension to
> > multi-level models or perhaps none of the obvious extensions hold all
> > the properties of the classical GLM form. (Most of this is discussed in
> > the GLMM FAQ!)
> >
> > This is a criticism of myself as an instructor as much as anything, but
> > it's something I like to emphasize in teaching nowadays and highlight as
> > a point 're-education' for students who've had a more traditional
> > education.
> >
> > Phillip
> >
> > On 8/31/23 08:45, Douglas Bates wrote:
> >> I have written about how I now understand the evaluation of the
> >> likelihood
> >> of a GLMM in an appendix of the in-progress book at
> >> https://juliamixedmodels.github.io/EmbraceUncertainty
> >>
> >> It is a difficult area to make sense of. Even now, after more than 25
> >> years of thinking about these models, I am still not sure exactly how to
> >> define the objective in a GLMM for a family with a dispersion parameter.
> >>
> >> On Thu, Aug 31, 2023 at 8:37 AM Douglas Bates <dmbates using gmail.com>
> wrote:
> >>
> >>> I may have attributed the confusion about conditional and marginal
> >>> deviances to you, Juho, when in fact it is the fault of the authors of
> >>> lme4, of whom I am one. If so, I apologize.
> >>>
> >>> We (the authors of lme4) had a discussion about the definition of
> >>> deviance
> >>> many years ago and I am not sure what the upshot was. It may have been
> >>> that we went with this incorrect definition of the "conditional
> >>> deviance".
> >>> Unfortunately my R skills are sufficiently atrophied that I haven't
> been
> >>> able to look up what is actually evaluated.
> >>>
> >>> The area of generalized linear models and generalized linear mixed
> >>> models
> >>> holds many traps for the unwary in terms of definitions, and some
> casual
> >>> definitions in the original glm function, going as far back as the S3
> >>> language, aid in the confusion. For example, the dev.resid function
> >>> in a
> >>> GLM family doesn't return the deviance residuals - it returns the
> >>> square of
> >>> the deviance residuals, which should be named the "unit deviances".
> >>>
> >>> Given a response vector and the predicted mean vector and a
> distribution
> >>> family one can define the unit deviances, which play a role similar
> >>> to the
> >>> squared residual in the Gaussian family. The sum of these unit
> >>> deviances
> >>> is the deviance of a generalized linear model, but it is not the
> >>> deviance
> >>> of a generalized linear mixed model, because the likelihood for a GLMM
> >>> involves both a "fidelity to the data" term *and* a "complexity of the
> >>> model" term, which is the size of the random effects vector in a
> certain
> >>> metric.
> >>>
> >>> So if indeed we used an inappropriate definition of deviance the
> mistake
> >>> is ours and we should correct it.
> >>>
> >>> On Thu, Aug 31, 2023 at 7:56 AM Douglas Bates <dmbates using gmail.com>
> wrote:
> >>>
> >>>> You say you are comparing conditional deviances rather than marginal
> >>>> deviances. Can you expand on that a bit further? How are you
> >>>> evaluating
> >>>> the conditional deviances?
> >>>>
> >>>> The models are being fit according to what you describe as the
> marginal
> >>>> deviance - what I would call the deviance. It is not surprising
> >>>> that what
> >>>> you are calling the conditional deviance is inconsistent with the
> >>>> nesting
> >>>> of the models because they weren't fit according to that criterion.
> >>>>
> >>>> julia> m01 = let f = @formula y ~ 1 + x1 + x2 + x3 + x4 + (1|id)
> >>>> fit(MixedModel, f, dat, Bernoulli(); contrasts, nAGQ=9)
> >>>> end
> >>>> Minimizing 141 Time: 0:00:00 ( 1.22 ms/it)
> >>>> Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 9)
> >>>> y ~ 1 + x1 + x2 + x3 + x4 + (1 | id)
> >>>> Distribution: Bernoulli{Float64}
> >>>> Link: LogitLink()
> >>>>
> >>>> logLik deviance AIC AICc BIC
> >>>> -1450.8163 2900.8511 2915.6326 2915.6833 2955.5600
> >>>>
> >>>> Variance components:
> >>>> Column Variance Std.Dev.
> >>>> id (Intercept) 0.270823 0.520406
> >>>>
> >>>> Number of obs: 2217; levels of grouping factors: 331
> >>>>
> >>>> Fixed-effects parameters:
> >>>> ────────────────────────────────────────────────────
> >>>> Coef. Std. Error z Pr(>|z|)
> >>>> ────────────────────────────────────────────────────
> >>>> (Intercept) 0.0969256 0.140211 0.69 0.4894
> >>>> x1: 1 -0.0449824 0.0553361 -0.81 0.4163
> >>>> x2 0.0744891 0.0133743 5.57 <1e-07
> >>>> x3 -0.548392 0.0914109 -6.00 <1e-08
> >>>> x4: B 0.390359 0.0803063 4.86 <1e-05
> >>>> x4: C 0.299932 0.0991249 3.03 0.0025
> >>>> ────────────────────────────────────────────────────
> >>>>
> >>>> julia> m02 = let f = @formula y ~ 1 + x1 + x2 + x3 + (1|id)
> >>>> fit(MixedModel, f, dat, Bernoulli(); contrasts, nAGQ=9)
> >>>> end
> >>>> Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 9)
> >>>> y ~ 1 + x1 + x2 + x3 + (1 | id)
> >>>> Distribution: Bernoulli{Float64}
> >>>> Link: LogitLink()
> >>>>
> >>>> logLik deviance AIC AICc BIC
> >>>> -1472.4551 2944.0654 2954.9102 2954.9373 2983.4297
> >>>>
> >>>> Variance components:
> >>>> Column Variance Std.Dev.
> >>>> id (Intercept) 0.274331 0.523766
> >>>>
> >>>> Number of obs: 2217; levels of grouping factors: 331
> >>>>
> >>>> Fixed-effects parameters:
> >>>> ────────────────────────────────────────────────────
> >>>> Coef. Std. Error z Pr(>|z|)
> >>>> ────────────────────────────────────────────────────
> >>>> (Intercept) -0.448496 0.100915 -4.44 <1e-05
> >>>> x1: 1 -0.0527684 0.0547746 -0.96 0.3354
> >>>> x2 0.0694393 0.0131541 5.28 <1e-06
> >>>> x3 -0.556903 0.0904798 -6.15 <1e-09
> >>>> ────────────────────────────────────────────────────
> >>>>
> >>>> julia> MixedModels.likelihoodratiotest(m02, m01)
> >>>> Model Formulae
> >>>> 1: y ~ 1 + x1 + x2 + x3 + (1 | id)
> >>>> 2: y ~ 1 + x1 + x2 + x3 + x4 + (1 | id)
> >>>> ──────────────────────────────────────────────────
> >>>> model-dof deviance χ² χ²-dof P(>χ²)
> >>>> ──────────────────────────────────────────────────
> >>>> [1] 5 2944.0654
> >>>> [2] 7 2900.8511 43.2144 2 <1e-09
> >>>> ──────────────────────────────────────────────────
> >>>>
> >>>> I would note that your data are so imbalanced with respect to id
> >>>> that it
> >>>> is not surprising that you get unstable results. (I changed your id
> >>>> column
> >>>> from integers to a factor so that 33 becomes S033.)
> >>>>
> >>>> 331×2 DataFrame
> >>>> Row │ id nrow
> >>>> │ String Int64
> >>>> ─────┼───────────────
> >>>> 1 │ S033 1227
> >>>> 2 │ S134 46
> >>>> 3 │ S295 45
> >>>> 4 │ S127 41
> >>>> 5 │ S125 33
> >>>> 6 │ S228 31
> >>>> 7 │ S193 23
> >>>> 8 │ S064 18
> >>>> 9 │ S281 16
> >>>> 10 │ S055 13
> >>>> 11 │ S035 13
> >>>> 12 │ S091 12
> >>>> 13 │ S175 11
> >>>> 14 │ S284 10
> >>>> 15 │ S159 10
> >>>> ⋮ │ ⋮ ⋮
> >>>> 317 │ S324 1
> >>>> 318 │ S115 1
> >>>> 319 │ S192 1
> >>>> 320 │ S201 1
> >>>> 321 │ S156 1
> >>>> 322 │ S202 1
> >>>> 323 │ S067 1
> >>>> 324 │ S264 1
> >>>> 325 │ S023 1
> >>>> 326 │ S090 1
> >>>> 327 │ S195 1
> >>>> 328 │ S170 1
> >>>> 329 │ S241 1
> >>>> 330 │ S189 1
> >>>> 331 │ S213 1
> >>>> 301 rows omitted
> >>>>
> >>>>
> >>>>
> >>>> On Thu, Aug 31, 2023 at 3:02 AM Juho Kristian Ruohonen <
> >>>> juho.kristian.ruohonen using gmail.com> wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> I thought it was impossible for deviance to decrease when a term is
> >>>>> removed!?!? Yet I'm seeing it happen with this pair of relatively
> >>>>> simple
> >>>>> Bernoulli GLMMs fit using lme4:glmer():
> >>>>>
> >>>>>> full <- glmer(y ~ (1|id) + x1 + x2 + x3 + x4, family = binomial,
> >>>>>> nAGQ =
> >>>>>> 6, data = anon)
> >>>>>> reduced <- update(full, ~. -x1)
> >>>>>> c(full = deviance(full), reduced = deviance(reduced))
> >>>>>
> >>>>> * full reduced*
> >>>>> *2808.671 2807.374 *
> >>>>>
> >>>>> What on earth going on? FYI, I am deliberately comparing conditional
> >>>>> deviances rather than marginal ones, because quite a few of the
> >>>>> clusters
> >>>>> are of inherent interest and likely to recur in future data.
> >>>>>
> >>>>> My anonymized datafile is attached.
> >>>>>
> >>>>> Best,
> >>>>>
> >>>>> Juho
> >>>>> _______________________________________________
> >>>>> R-sig-mixed-models using r-project.org mailing list
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>>>>
> >> [[alternative HTML version deleted]]
> >>
> >> _______________________________________________
> >> R-sig-mixed-models using r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
> > _______________________________________________
> > R-sig-mixed-models using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list