[R-sig-ME] Point estimate outside of its confidence interval with lmer

Thu May 12 13:01:30 CEST 2022

Dear Emmanuel,

Please see below for my responses.

Best,
Wolfgang

>-----Original Message-----
>From: R-sig-mixed-models [mailto:r-sig-mixed-models-bounces using r-project.org] On
>Behalf Of Emmanuel Curis
>Sent: Thursday, 12 May, 2022 9:30
>To: r-sig-mixed-models using r-project.org
>Subject: [R-sig-ME] Point estimate outside of its confidence interval with lmer
>
>Hello,
>
>I have encountered an unexpected result for some datasets when using confint
>after fitting a model with lmer: the confidence intervals for the standard
>deviations in the model did not included the point estimate, given by summary
>for instance.
>
>I think the problem is a mix of small sample size and REML vs ML, but I would
>be happy to have confirmation that my interpretation is correct, since I'm
>not very familiar with profiling and REML vs ML... and wonder if something
>more problematic is occuring.
>
>My interpretation is that lmer fits using REML by default,

Correct.

>hence the variances are estimated "unbiased" (the equivalent of dividing by n
>- k for the residual variance of a linear model, but I'm not sure exactly
>what is n and k for random effects variance estimation).

Only for some very special cases can we show that the variance components are
estimated unbiasedly. However, yes, REML estimation tends to provide estimates
of variance components that are approximately unbiased in many cases.

Also, for more complex models, changing a ML estimate into a REML one isn't
simply a multiplication of the ML estimate by some factor that depends on
things like the sample size (and in more complex models, number of clusters)
and the number of parameters.

>But when confint is called, using profile, the ML profile is used, hence
>variance confidence intervals are estimated "biased" (the equivalent
>of dividing by n).  However, when n is small, dividing by n - k and n
>may give very different results, hence in some cases the profiled
>confidence interval for ML estimate does not include the REML
>estimate, for the standard deviations.  I guess in practice the
>difference between REML and ML is more complex than just using n or
>n-k, but would it be idea?

Correct - see above. So I wouldn't just heuristically apply some 'correction'
that only comes from simple regression models.

Actually, when constructing a profile likelihood CI for a variance component
(which is what confint() does by default for lmer model objects), one can also
profile the restricted log likelihood function, which would guarantee that the
estimate falls into the CI. I actually thought that this is what confint()
does for lmer objects (fitted with REML), but I checked and you are correct -
even when the model was fitted with REML, all profile likelihood CIs
(including those for variance components) are based on the regular likelihood.

Without a reproducible example, I cannot tell you whether this is really the
reason for the estimate falling outside of the CI, but it certainly could be.

>Support for this is that
> 1) when using bootstrap, there seems to be no such discrepancy
> 2) when fitting using REML = FALSE, the profiled IC is the same and
>    does include the (different) point estimate
>
>Does it sound correct?

Not sure how 1) helps to diagnose this, but 2) certainly provides support for
the hypothesis that the discrepancy arises from different likelihoods being
used for estimation and profiling.

>Additionnal questions, if this interpretation is correct:
> - would it make sense to make confidence intervals based on REML
>   profiles, and not ML profiles? if so, how?

As noted above, this can be done for variance components, but I don't see a
way of doing this with confint() for lmer model objects.

> - wouldn't a warning be a good idea when point estimates are outside
>   CI, with the explaination if it is indeed REML vs ML?

That's up to the developers of lme4 to decide.

> - if this is indeed a "small sample size" problem, I guess in such
>   cases any asymptotic result is difficult to trust, right?  Does it
>   mean profiled interval cannot be trusted also, neither nested
>   models tests, and that only bootstrap may be used?

Profiling the likelihood relies on the asymptotic behavior of the likelihood
ratio test. So yes, profiling likelihood CIs may not have a nominal coverage
rate in small samples (for some definition of 'small', which is context
dependent).

> - in such cases, is there any argument to prefer REML over ML or
>   vice-versa?

One can make arguments in favor of both ML and REML. REML tends to provide
approximately unbiased estimates, but with larger mean-squared error, while ML
tends to provide negative biased estimates, but with smaller MSE.

>Thanks in advance for your help,
>
>--
>                                Emmanuel CURIS
>                                emmanuel.curis using parisdescartes.fr
>
>Page WWW: http://emmanuel.curis.online.fr/index.html