[R-sig-ME] Point estimate outside of its confidence interval with lmer

Thu May 12 15:54:56 CEST 2022

   For what it's worth, there is a (relatively ancient) Github issue 
discussing this:

https://github.com/lme4/lme4/issues/325

   Further evidence that we ought to add a warning, or at least a 
message, saying that the profiles are based on the ML fit.

   Can anyone point me to some theory that supports/describes the 
process of constructing CIs based on the profile of the REML statistic ... ?

   cheers
    Ben Bolker

On 2022-05-12 7:01 a.m., Viechtbauer, Wolfgang (NP) wrote:
> Dear Emmanuel,
> 
> Please see below for my responses.
> 
> Best,
> Wolfgang
> 
>> -----Original Message-----
>> From: R-sig-mixed-models [mailto:r-sig-mixed-models-bounces using r-project.org] On
>> Behalf Of Emmanuel Curis
>> Sent: Thursday, 12 May, 2022 9:30
>> To: r-sig-mixed-models using r-project.org
>> Subject: [R-sig-ME] Point estimate outside of its confidence interval with lmer
>>
>> Hello,
>>
>> I have encountered an unexpected result for some datasets when using confint
>> after fitting a model with lmer: the confidence intervals for the standard
>> deviations in the model did not included the point estimate, given by summary
>> for instance.
>>
>> I think the problem is a mix of small sample size and REML vs ML, but I would
>> be happy to have confirmation that my interpretation is correct, since I'm
>> not very familiar with profiling and REML vs ML... and wonder if something
>> more problematic is occuring.
>>
>> My interpretation is that lmer fits using REML by default,
> 
> Correct.
> 
>> hence the variances are estimated "unbiased" (the equivalent of dividing by n
>> - k for the residual variance of a linear model, but I'm not sure exactly
>> what is n and k for random effects variance estimation).
> 
> Only for some very special cases can we show that the variance components are
> estimated unbiasedly. However, yes, REML estimation tends to provide estimates
> of variance components that are approximately unbiased in many cases.
> 
> Also, for more complex models, changing a ML estimate into a REML one isn't
> simply a multiplication of the ML estimate by some factor that depends on
> things like the sample size (and in more complex models, number of clusters)
> and the number of parameters.
> 
>> But when confint is called, using profile, the ML profile is used, hence
>> variance confidence intervals are estimated "biased" (the equivalent
>> of dividing by n).  However, when n is small, dividing by n - k and n
>> may give very different results, hence in some cases the profiled
>> confidence interval for ML estimate does not include the REML
>> estimate, for the standard deviations.  I guess in practice the
>> difference between REML and ML is more complex than just using n or
>> n-k, but would it be idea?
> 
> Correct - see above. So I wouldn't just heuristically apply some 'correction'
> that only comes from simple regression models.
> 
> Actually, when constructing a profile likelihood CI for a variance component
> (which is what confint() does by default for lmer model objects), one can also
> profile the restricted log likelihood function, which would guarantee that the
> estimate falls into the CI. I actually thought that this is what confint()
> does for lmer objects (fitted with REML), but I checked and you are correct -
> even when the model was fitted with REML, all profile likelihood CIs
> (including those for variance components) are based on the regular likelihood.
> 
> Without a reproducible example, I cannot tell you whether this is really the
> reason for the estimate falling outside of the CI, but it certainly could be.
> 
>> Support for this is that
>> 1) when using bootstrap, there seems to be no such discrepancy
>> 2) when fitting using REML = FALSE, the profiled IC is the same and
>>     does include the (different) point estimate
>>
>> Does it sound correct?
> 
> Not sure how 1) helps to diagnose this, but 2) certainly provides support for
> the hypothesis that the discrepancy arises from different likelihoods being
> used for estimation and profiling.
> 
>> Additionnal questions, if this interpretation is correct:
>> - would it make sense to make confidence intervals based on REML
>>    profiles, and not ML profiles? if so, how?
> 
> As noted above, this can be done for variance components, but I don't see a
> way of doing this with confint() for lmer model objects.
> 
>> - wouldn't a warning be a good idea when point estimates are outside
>>    CI, with the explaination if it is indeed REML vs ML?
> 
> That's up to the developers of lme4 to decide.
> 
>> - if this is indeed a "small sample size" problem, I guess in such
>>    cases any asymptotic result is difficult to trust, right?  Does it
>>    mean profiled interval cannot be trusted also, neither nested
>>    models tests, and that only bootstrap may be used?
> 
> Profiling the likelihood relies on the asymptotic behavior of the likelihood
> ratio test. So yes, profiling likelihood CIs may not have a nominal coverage
> rate in small samples (for some definition of 'small', which is context
> dependent).
> 
>> - in such cases, is there any argument to prefer REML over ML or
>>    vice-versa?
> 
> One can make arguments in favor of both ML and REML. REML tends to provide
> approximately unbiased estimates, but with larger mean-squared error, while ML
> tends to provide negative biased estimates, but with smaller MSE.
> 
>> Thanks in advance for your help,
>>
>> --
>>                                 Emmanuel CURIS
>>                                 emmanuel.curis using parisdescartes.fr
>>
>> Page WWW: http://emmanuel.curis.online.fr/index.html
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
(Acting) Graduate chair, Mathematics & Statistics