[R-sig-ME] lme4/glmer convergence warnings

Thu Apr 10 21:08:07 CEST 2014

On 14-04-10 04:38 AM, W Robert Long wrote:
> Hi Ben
> 
> For my model, I get
> 
>>   max(abs(relgrad))
> [1] 1.081706
> 
> Does this help ?
> 
> Thanks
> Rob
> 

  Thanks.
   Unfortunately it means there's still something we don't understand.
To sum up:

  * we calculate the finite-difference gradients and Hessians with
respect to the model parameters at the estimated MLE (or restricted MLE).

  - these might be poor estimates; we are using our own hand-rolled,
naive central finite difference code rather than the better tested code
in the numDeriv package, which uses the more expensive but more accurate
Richardson method.  To see if this is the problem, compare

  fitted_model at optinfo$derivs$gradient

to

  dd <- update(fitted_model,devFunOnly=TRUE)
  params <- getME(fitted_model,"theta")
## or unlist(getME(fitted_model,c("theta","fixef")) for GLMMs
  grad(dd,params)

and see if they differ significantly.  I mostly have *not* seen a
discrepancy in these, although Rune Haubo has reported some
discrepancies (although I think they were in the Hessian rather than the
gradient ... ?)

  It might make sense to compare the Hessian in @optinfo with
hessian(dd,params) from numDeriv as well.

 * we then check those gradients, or the relative gradients from

  with(fitted_model at optinfo$derivs,solve(Hessian,gradient))

against some tolerance.  It makes sense that large gradients at the
estimated parameters (where they should be zero) are a problem, but it
continues to be unclear to me on exactly what scale they should be
small/how we should make this comparison ...

  It might make sense to see if John Nash's optfntools package (on
r-forge) has any useful tools or ideas, although as I recall (vaguely)
those implementations of (e.g.) the KKT criteria
<http://en.wikipedia.org/wiki/Karush%E2%80%93Kuhn%E2%80%93Tucker_conditions>
were pretty sensitive ...

 * as a crude test of whether the actual value you got is a false
convergence (I know in this case you don't think it is, but including
this for completeness) the things I know to do are (1) try re-fitting
with different optimizers, either starting from the putative best fit or
from scratch (see
<https://github.com/lme4/lme4/blob/master/misc/issues/allFit.R>); (2)
try re-starting the _same_ optimizer from the putative best fit; (3)
explore the likelihood surface (e.g. with profiling or bbmle::slice2D).

  Ben Bolker

> On 10/04/2014 03:33, Ben Bolker wrote:
>> Ben Bolker <bbolker at ...> writes:
>>
>>>
>>> On 14-04-06 04:31 AM, Tibor Kiss wrote:
>>>> Hi,
>>>>
>>>> being somewhat nonplussed by similar messages, I also applied
>>>   Ben's recent suggestion to one of my models
>>> to get:
>>>>
>>>>       Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
>>>> 1.343e-05 3.530e-05 5.756e-05 7.631e-05 9.841e-05 1.932e-04
>>>>
>>>> So following up on Rob's message: What does it mean?
>>>>
>>>> With kind regards
>>>>
>>>> Tibor
>>>>
>>>
>>>    It means that on the scale of the _standard deviations_ of the
>>> parameters, the estimated gradients at the MLE (or restricted MLE) are
>>> not large.  I was surprised in Rob's case that these scaled gradients
>>> were not that small; much smaller than without the scaling, but not
>>> small enough to make me think  really understand what's going on.
>>>
>>>    To recapitulate: the appearance of all of these new messages in the
>>> latest version of lme4 is **not** due to a degradation or change in the
>>> optimization or fitting procedure -- it's due to a new set of
>>> convergence tests that we implemented, that we think are giving a lot of
>>> false positives.  You can easily shut them off yourself, or raise the
>>> tolerance for the warnings (see ?lmerControl/?glmerControl).  As
>>> developers, we're a bit stuck now because we don't want to turn the
>>> warnings _off_ until we understand the circumstances that are triggering
>>> them better, and that takes more time and effort than we've been able to
>>> muster so far.
>>
>>    [much context snipped]
>>
>>    Just to follow up on this: more technical discussion is going on at
>> https://github.com/lme4/lme4/issues/120 ... at present, it is looking
>> like scaling the gradient by the hessian is going to solve a lot of
>> problems.  If you are experiencing convergence warnings about
>> max|grad| that you suspect are false positives, it would be a great
>> help if you could try
>>
>>    relgrad <- with(fitted_model at optinfo$derivs,solve(Hessian,gradient))
>>    max(abs(relgrad))
>>
>> check if the result is a small number (e.g. <0.001) and report **one
>> way or the other** on this list, or at the Github url above, or
>> (least preferred) by e-mailing lme4-authors at lists.r-forge.r-project.org
>> We also hope that this test *will* pick up the cases where people have
>> reported problems with Nelder-Mead not working properly ...
>>
>>    Ben Bolker
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models