[R-sig-ME] Point estimate outside of its confidence interval with lmer

Fri May 13 17:18:53 CEST 2022

Thanks Ben & Wolfgang for your explainations.

I'll try to prepare an anonymised and selected dataset to check that
it is the only reason of this discrepancy.

Wolfgang, by bootstrap as a check, I meant that since bootstrapped
dataset are refited using REML, the sampled sigma are from REML hence
IC based on them should include the observed one — and this is the
case, in favor of a REML/ML discrepancy.  But that could also be that
the khi-square approximation is just false, indeed.  IC are not much
larger by bootsrap, however, instead shifted when comparing boostrap
IC for REML and ML fits.

Best regards,

Le Thu, May 12, 2022 at 04:01:04PM +0000, Viechtbauer, Wolfgang (NP) a écrit :
> Hi Ben,
> 
> There is nothing unusual about profiling the restricted likelihood and in fact, LRTs and profile likelihood CIs for variance components tend to perform slightly better. See, for example:
> 
> Morrell, C. H. (1998). Likelihood ratio testing of variance components in the linear mixed-effects model using restricted maximum likelihood. Biometrics, 54(4), 1560-1568. 
> 
> This is also discussed in various books on mixed-effects models. For example, Verbeke & Molenberghs' "Linear Mixed Models for Longitudinal Data" (see section 6.3.2: "Further, in contrast to the LR test for fixed effects (see Section 6.2.5), valid LR tests are also obtained under REML estimation. [...]"). So, if one can conduct a LRT for variance components under REML estimation, one can also profile the restricted likelihood and use this to construct CIs.
> 
> Of course, the usual caveats apply when the parameter tested falls on the boundary of the parameter space.
> 
> Best,
> Wolfgang
> 
> >-----Original Message-----
> >From: R-sig-mixed-models [mailto:r-sig-mixed-models-bounces using r-project.org] On
> >Behalf Of Ben Bolker
> >Sent: Thursday, 12 May, 2022 15:55
> >To: r-sig-mixed-models using r-project.org
> >Subject: Re: [R-sig-ME] Point estimate outside of its confidence interval with
> >lmer
> >
> >   For what it's worth, there is a (relatively ancient) Github issue
> >discussing this:
> >
> >https://github.com/lme4/lme4/issues/325
> >
> >   Further evidence that we ought to add a warning, or at least a
> >message, saying that the profiles are based on the ML fit.
> >
> >   Can anyone point me to some theory that supports/describes the
> >process of constructing CIs based on the profile of the REML statistic ... ?
> >
> >   cheers
> >    Ben Bolker
> >
> >
> >On 2022-05-12 7:01 a.m., Viechtbauer, Wolfgang (NP) wrote:
> >> Dear Emmanuel,
> >>
> >> Please see below for my responses.
> >>
> >> Best,
> >> Wolfgang
> >>
> >>> -----Original Message-----
> >>> From: R-sig-mixed-models [mailto:r-sig-mixed-models-bounces using r-project.org] On
> >>> Behalf Of Emmanuel Curis
> >>> Sent: Thursday, 12 May, 2022 9:30
> >>> To: r-sig-mixed-models using r-project.org
> >>> Subject: [R-sig-ME] Point estimate outside of its confidence interval with
> >lmer
> >>>
> >>> Hello,
> >>>
> >>> I have encountered an unexpected result for some datasets when using confint
> >>> after fitting a model with lmer: the confidence intervals for the standard
> >>> deviations in the model did not included the point estimate, given by summary
> >>> for instance.
> >>>
> >>> I think the problem is a mix of small sample size and REML vs ML, but I would
> >>> be happy to have confirmation that my interpretation is correct, since I'm
> >>> not very familiar with profiling and REML vs ML... and wonder if something
> >>> more problematic is occuring.
> >>>
> >>> My interpretation is that lmer fits using REML by default,
> >>
> >> Correct.
> >>
> >>> hence the variances are estimated "unbiased" (the equivalent of dividing by n
> >>> - k for the residual variance of a linear model, but I'm not sure exactly
> >>> what is n and k for random effects variance estimation).
> >>
> >> Only for some very special cases can we show that the variance components are
> >> estimated unbiasedly. However, yes, REML estimation tends to provide estimates
> >> of variance components that are approximately unbiased in many cases.
> >>
> >> Also, for more complex models, changing a ML estimate into a REML one isn't
> >> simply a multiplication of the ML estimate by some factor that depends on
> >> things like the sample size (and in more complex models, number of clusters)
> >> and the number of parameters.
> >>
> >>> But when confint is called, using profile, the ML profile is used, hence
> >>> variance confidence intervals are estimated "biased" (the equivalent
> >>> of dividing by n).  However, when n is small, dividing by n - k and n
> >>> may give very different results, hence in some cases the profiled
> >>> confidence interval for ML estimate does not include the REML
> >>> estimate, for the standard deviations.  I guess in practice the
> >>> difference between REML and ML is more complex than just using n or
> >>> n-k, but would it be idea?
> >>
> >> Correct - see above. So I wouldn't just heuristically apply some 'correction'
> >> that only comes from simple regression models.
> >>
> >> Actually, when constructing a profile likelihood CI for a variance component
> >> (which is what confint() does by default for lmer model objects), one can also
> >> profile the restricted log likelihood function, which would guarantee that the
> >> estimate falls into the CI. I actually thought that this is what confint()
> >> does for lmer objects (fitted with REML), but I checked and you are correct -
> >> even when the model was fitted with REML, all profile likelihood CIs
> >> (including those for variance components) are based on the regular likelihood.
> >>
> >> Without a reproducible example, I cannot tell you whether this is really the
> >> reason for the estimate falling outside of the CI, but it certainly could be.
> >>
> >>> Support for this is that
> >>> 1) when using bootstrap, there seems to be no such discrepancy
> >>> 2) when fitting using REML = FALSE, the profiled IC is the same and
> >>>     does include the (different) point estimate
> >>>
> >>> Does it sound correct?
> >>
> >> Not sure how 1) helps to diagnose this, but 2) certainly provides support for
> >> the hypothesis that the discrepancy arises from different likelihoods being
> >> used for estimation and profiling.
> >>
> >>> Additionnal questions, if this interpretation is correct:
> >>> - would it make sense to make confidence intervals based on REML
> >>>    profiles, and not ML profiles? if so, how?
> >>
> >> As noted above, this can be done for variance components, but I don't see a
> >> way of doing this with confint() for lmer model objects.
> >>
> >>> - wouldn't a warning be a good idea when point estimates are outside
> >>>    CI, with the explaination if it is indeed REML vs ML?
> >>
> >> That's up to the developers of lme4 to decide.
> >>
> >>> - if this is indeed a "small sample size" problem, I guess in such
> >>>    cases any asymptotic result is difficult to trust, right?  Does it
> >>>    mean profiled interval cannot be trusted also, neither nested
> >>>    models tests, and that only bootstrap may be used?
> >>
> >> Profiling the likelihood relies on the asymptotic behavior of the likelihood
> >> ratio test. So yes, profiling likelihood CIs may not have a nominal coverage
> >> rate in small samples (for some definition of 'small', which is context
> >> dependent).
> >>
> >>> - in such cases, is there any argument to prefer REML over ML or
> >>>    vice-versa?
> >>
> >> One can make arguments in favor of both ML and REML. REML tends to provide
> >> approximately unbiased estimates, but with larger mean-squared error, while ML
> >> tends to provide negative biased estimates, but with smaller MSE.
> >>
> >>> Thanks in advance for your help,
> >>>
> >>> --
> >>>                                 Emmanuel CURIS
> >>>                                 emmanuel.curis using parisdescartes.fr
> >>>
> >>> Page WWW: http://emmanuel.curis.online.fr/index.html
> >>
> >> _______________________________________________
> >> R-sig-mixed-models using r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
> >--
> >Dr. Benjamin Bolker
> >Professor, Mathematics & Statistics and Biology, McMaster University
> >Director, School of Computational Science and Engineering
> >(Acting) Graduate chair, Mathematics & Statistics
> >
> >_______________________________________________
> >R-sig-mixed-models using r-project.org mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 
                                Emmanuel CURIS
                                emmanuel.curis using parisdescartes.fr

Page WWW: http://emmanuel.curis.online.fr/index.html