[R-sig-ME] Statistical significance of random-effects (lme4 or others)

Alday, Phillip Ph||||p@A|d@y @end|ng |rom mp|@n|
Mon Sep 7 09:52:21 CEST 2020

I'm replying to the last email in the thread, but I'm addressing several
issues raised along the way:

* I'm not sure you should call it a _likelihood_ ratio test for REML
fitted models given that REML isn't _the_ likelihood. (Every couple of
months this topic comes up, so I won't go into it here.)

* That said, you can actually compare REML fitted models *if the fixed
effects are identical*. (This is actually part of why REML is viewed
with some skepticism by some -- the computation of the objective is
dependent on the parameterization of the fixed effects.)

* In all cases, the usual tool for model comparison (LRT) require nested
models. This is why it's tricky to test different random-effects
structures against each other: it can be quite easy to have non nested

* The p/2 for LRT on the random effects comes from the standard LRT
being a two-sided test, but because variances are bounded at zero, you
actually need a one-sided test.

* The variance components can actually be zero; such singular fits are
mathematically well defined. (Being able to estimate such models was one
of the advances in lme4 compared to its predecessor nlme.) A estimate of
zero doesn't mean that there is between group (e.g. here: between
school) variance, but rather that the variance between schools is not
distinguishable from the variability induced by the residual variance.
That sounds quite technical, but in more practical terms means "a zero
estimate means that you can't detect any variation between groups beyond
the residual variation; the between-group variation may be truly zero or
you may simply lack the power to detect it".

* anova() will work for comparing lme4 fits to OLS fits, but the lme4
fit has to be first:

> anova(m1, ols1)
refitting model(s) with ML (instead of REML)
Data: hsb
ols1: math ~ 1
m1: math ~ 1 + (1 | sch.id)
     npar   AIC   BIC logLik deviance  Chisq Df Pr(>Chisq)
ols1    2 48104 48117 -24050    48100
m1      3 47122 47142 -23558    47116 983.92  1  < 2.2e-16 ***
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’

(For the technically curious: you have to specify the lme4 object first
due to the way single-dispatch works in R.)

* I would advise against significance tests and standard errors for (the
estimates of) the random effects because their (sampling) distribution
can be highly skewed and significance tests don't capture this.

* Instead, you should use confidence intervals. Profile confidence
intervals are okay, but the gold standard are from the parametric
bootstrap. Here are two examples from the OP's original code:

> confint(m1, method="profile", oldNames=FALSE)
Computing profile confidence intervals ...
                          2.5 %    97.5 %
sd_(Intercept)|sch.id  2.594729  3.315880
sigma                  6.154803  6.361786
(Intercept)           12.156289 13.117121
> confint(m1, method="boot", oldNames=FALSE)
Computing bootstrap confidence intervals ...
                          2.5 %    97.5 %
sd_(Intercept)|sch.id  2.551532  3.272484
sigma                  6.151822  6.363059
(Intercept)           12.091033 13.123430

see ?confint.merMod for more details about other possible arguments.

* Finally, be very careful interpreting p-values. The correct
interpretation is very difficult and tricky.


On 7/9/20 8:29 am, Juho Kristian Ruohonen wrote:
> While I agree with Daniel that an assumption of zero random-effect variance
> doesn't make much sense, I would point out that in my understanding, a
> simple LRT comparing a model with and without the random effect, where
> applicable, tests exactly that. The p-value divided by 2 reflects the
> probability of a deviance difference equal to or greater than the one
> observed between the models, under the null hypothesis that the variance
> component equals zero. A very low p/2 means a random-effect variance of 0
> is very implausible.
> Best,
> J
> ma 7. syysk. 2020 klo 9.13 Daniel Lüdecke (d.luedecke using uke.de) kirjoitti:
>> Hi Simon,
>> I'm not sure if this is a useful question. The variance can / should never
>> be negative, and it usually is always above 0 if you have some variation in
>> your outcome depending on the group factors (random effects).
>> Packages I know that do some "significance testing" or uncertainty
>> estimation of random effects are lmerTest::ranova() (quite well documented
>> what it does) or "arm::se.ranef()" resp.
>> "parameters::standard_error(effects
>> = "random")". The two latter packages compute standard errors for the
>> conditional modes of the random effects (what you get with "ranef()").
>> Best
>> Daniel
>> -----Ursprüngliche Nachricht-----
>> Von: R-sig-mixed-models <r-sig-mixed-models-bounces using r-project.org> Im
>> Auftrag von Simon Harmel
>> Gesendet: Montag, 7. September 2020 06:28
>> An: Juho Kristian Ruohonen <juho.kristian.ruohonen using gmail.com>
>> Cc: r-sig-mixed-models <r-sig-mixed-models using r-project.org>
>> Betreff: Re: [R-sig-ME] Statistical significance of random-effects (lme4 or
>> others)
>> Dear J,
>> My goal is not to do any comparison between any models. Rather, for each
>> model I want to know if the variance component is different from 0 or not.
>> And what is a p-value for that.
>> On Sun, Sep 6, 2020 at 11:21 PM Juho Kristian Ruohonen <
>> juho.kristian.ruohonen using gmail.com> wrote:
>>> A non-statistician's two cents:
>>>    1. I'm not sure likelihood-ratio tests (LRTs) are valid at all for
>>>    models fit using REML (rather than MLE). The anova() function seems to
>>>    agree, given that its present version (4.0.2) refits the models using
>> MLE
>>>    in order to compare their deviances.
>>>    2. Even when the models have been fit using MLE, likelihood-ratio
>>>    tests for variance components are only applicable in cases of a single
>>>    variance component. In your case, this means a LRT can only be used
>> for
>> *m1
>>>    vs ols1* and *m2 vs ols2*. There, you simply divide the p-value
>>>    reported by *anova(m1, ols1) *and *anova(m2, ols2)* by two. Both are
>>>    obviously extremely statistically significant. However, models *m3
>> *and
>>>    *m4* both have two random effects. The last time I checked, the
>>>    default assumption of a chi-squared deviance is no longer applicable
>> in
>>>    such cases, so the p-values reported by Stata and SPSS are only
>> approximate
>>>    and tend to be too conservative. Perhaps you might apply an
>> information
>>>    criterion instead, such as the AIC
>> <
>> https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#can-i-use-aic-for-m
>> ixed-models-how-do-i-count-the-number-of-degrees-of-freedom-for-a-random-eff
>> <https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#can-i-use-aic-for-mixed-models-how-do-i-count-the-number-of-degrees-of-freedom-for-a-random-eff>
>> ect>
>>>    .
>>> Best,
>>> J
>> --
>> _____________________________________________________________________
>> Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen
>> Rechts; Gerichtsstand: Hamburg | www.uke.de
>> Vorstandsmitglieder: Prof. Dr. Burkhard Göke (Vorsitzender), Joachim
>> Prölß, Prof. Dr. Blanche Schwappach-Pignataro, Marya Verdel
>> _____________________________________________________________________
> 	[[alternative HTML version deleted]]
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

More information about the R-sig-mixed-models mailing list