[R-sig-ME] lme4/glmer convergence warnings

Thu Apr 10 23:35:18 CEST 2014

Hi Ben

Thanks for clarifying. I get

 > fitted_model at optinfo$derivs$gradient
  [1]   -0.04070906   -0.41672993   -0.72225653  -32.57950953 
-61.30328647  -56.38420303   -0.64343470   -0.34225110   -0.03854723 
4847.74510014    0.27245700

 > dd <- update(fitted_model,devFunOnly=TRUE)
 > params <- unlist(getME(fitted_model,c("theta","beta")))
 > grad(dd,params)
  [1]   -0.04076175   -0.41673067   -0.72230322  -32.88266857 
-63.62821306  -56.99385033   -0.64343614   -0.34216063   -0.03854783 
-835.66161982    0.27250186

So one large discrepancy, one smaller one, and the rest pretty close.

Please advise.

Thanks again
Rob

On 10/04/2014 21:32, Ben Bolker wrote:
> On 14-04-10 04:28 PM, W Robert Long wrote:
>> Hi Ben
>>
>> Could you clarify what you mean by
>>
>>>     params <- getME(fitted_model,"theta")
>>> ## or unlist(getME(fitted_model,c("theta","fixef")) for GLMMs
>>>     grad(dd,params)
>>
>> This is a logistic model, so I assume I need to use unlist, however it
>> seems that "fixef" is not a structure in fitted_model because:
>> Error in match.arg(name) :
>>    'arg' should be one of “X”, “Z”, “Zt”, “Ztlist”, “y”, “mu”, “u”, “b”,
>> “Gp”, “Tp”, “L”, “Lambda”, “Lambdat”, “Lind”, “A”, “RX”, “RZX”, “sigma”,
>> “flist”, “beta”, “theta”, “ST”, “REML”, “is_REML”, “n_rtrms”, “n_rfacs”,
>> “cnms”, “devcomp”, “offset”, “lower”
>>
>> Also, grad() does not seem to be an available function.
>>
>> Please advise
>>
>> Thanks
>> Rob
>
>    Oops, too much of a hurry.
>
>   * We just added 'fixef' as a synonym for 'beta' in getME, but it's only
> on github.  unlist(getME(fitted_model,c("theta","beta"))) should work ...
>   * grad() is in the numDeriv package -- install/load it first, or use
> numDeriv::grad() if it's installed and you don't feel like loading.
>
>    cheers
>      Ben Bolker
>
>>
>>
>>
>>
>> On 10/04/2014 20:08, Ben Bolker wrote:
>>> On 14-04-10 04:38 AM, W Robert Long wrote:
>>>> Hi Ben
>>>>
>>>> For my model, I get
>>>>
>>>>>     max(abs(relgrad))
>>>> [1] 1.081706
>>>>
>>>> Does this help ?
>>>>
>>>> Thanks
>>>> Rob
>>>>
>>>
>>>     Thanks.
>>>      Unfortunately it means there's still something we don't understand.
>>> To sum up:
>>>
>>>     * we calculate the finite-difference gradients and Hessians with
>>> respect to the model parameters at the estimated MLE (or restricted MLE).
>>>
>>>     - these might be poor estimates; we are using our own hand-rolled,
>>> naive central finite difference code rather than the better tested code
>>> in the numDeriv package, which uses the more expensive but more accurate
>>> Richardson method.  To see if this is the problem, compare
>>>
>>>     fitted_model at optinfo$derivs$gradient
>>>
>>> to
>>>
>>>     dd <- update(fitted_model,devFunOnly=TRUE)
>>>     params <- getME(fitted_model,"theta")
>>> ## or unlist(getME(fitted_model,c("theta","fixef")) for GLMMs
>>>     grad(dd,params)
>>>
>>> and see if they differ significantly.  I mostly have *not* seen a
>>> discrepancy in these, although Rune Haubo has reported some
>>> discrepancies (although I think they were in the Hessian rather than the
>>> gradient ... ?)
>>>
>>>     It might make sense to compare the Hessian in @optinfo with
>>> hessian(dd,params) from numDeriv as well.
>>>
>>>    * we then check those gradients, or the relative gradients from
>>>
>>>     with(fitted_model at optinfo$derivs,solve(Hessian,gradient))
>>>
>>> against some tolerance.  It makes sense that large gradients at the
>>> estimated parameters (where they should be zero) are a problem, but it
>>> continues to be unclear to me on exactly what scale they should be
>>> small/how we should make this comparison ...
>>>
>>>     It might make sense to see if John Nash's optfntools package (on
>>> r-forge) has any useful tools or ideas, although as I recall (vaguely)
>>> those implementations of (e.g.) the KKT criteria
>>> <http://en.wikipedia.org/wiki/Karush%E2%80%93Kuhn%E2%80%93Tucker_conditions>
>>>
>>> were pretty sensitive ...
>>>
>>>    * as a crude test of whether the actual value you got is a false
>>> convergence (I know in this case you don't think it is, but including
>>> this for completeness) the things I know to do are (1) try re-fitting
>>> with different optimizers, either starting from the putative best fit or
>>> from scratch (see
>>> <https://github.com/lme4/lme4/blob/master/misc/issues/allFit.R>); (2)
>>> try re-starting the _same_ optimizer from the putative best fit; (3)
>>> explore the likelihood surface (e.g. with profiling or bbmle::slice2D).
>>>
>>>     Ben Bolker
>>>
>>>
>>>
>>>
>>>
>>>> On 10/04/2014 03:33, Ben Bolker wrote:
>>>>> Ben Bolker <bbolker at ...> writes:
>>>>>
>>>>>>
>>>>>> On 14-04-06 04:31 AM, Tibor Kiss wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> being somewhat nonplussed by similar messages, I also applied
>>>>>>     Ben's recent suggestion to one of my models
>>>>>> to get:
>>>>>>>
>>>>>>>         Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
>>>>>>> 1.343e-05 3.530e-05 5.756e-05 7.631e-05 9.841e-05 1.932e-04
>>>>>>>
>>>>>>> So following up on Rob's message: What does it mean?
>>>>>>>
>>>>>>> With kind regards
>>>>>>>
>>>>>>> Tibor
>>>>>>>
>>>>>>
>>>>>>      It means that on the scale of the _standard deviations_ of the
>>>>>> parameters, the estimated gradients at the MLE (or restricted MLE) are
>>>>>> not large.  I was surprised in Rob's case that these scaled gradients
>>>>>> were not that small; much smaller than without the scaling, but not
>>>>>> small enough to make me think  really understand what's going on.
>>>>>>
>>>>>>      To recapitulate: the appearance of all of these new messages in
>>>>>> the
>>>>>> latest version of lme4 is **not** due to a degradation or change in
>>>>>> the
>>>>>> optimization or fitting procedure -- it's due to a new set of
>>>>>> convergence tests that we implemented, that we think are giving a
>>>>>> lot of
>>>>>> false positives.  You can easily shut them off yourself, or raise the
>>>>>> tolerance for the warnings (see ?lmerControl/?glmerControl).  As
>>>>>> developers, we're a bit stuck now because we don't want to turn the
>>>>>> warnings _off_ until we understand the circumstances that are
>>>>>> triggering
>>>>>> them better, and that takes more time and effort than we've been
>>>>>> able to
>>>>>> muster so far.
>>>>>
>>>>>      [much context snipped]
>>>>>
>>>>>      Just to follow up on this: more technical discussion is going on at
>>>>> https://github.com/lme4/lme4/issues/120 ... at present, it is looking
>>>>> like scaling the gradient by the hessian is going to solve a lot of
>>>>> problems.  If you are experiencing convergence warnings about
>>>>> max|grad| that you suspect are false positives, it would be a great
>>>>> help if you could try
>>>>>
>>>>>      relgrad <-
>>>>> with(fitted_model at optinfo$derivs,solve(Hessian,gradient))
>>>>>      max(abs(relgrad))
>>>>>
>>>>> check if the result is a small number (e.g. <0.001) and report **one
>>>>> way or the other** on this list, or at the Github url above, or
>>>>> (least preferred) by e-mailing lme4-authors at lists.r-forge.r-project.org
>>>>> We also hope that this test *will* pick up the cases where people have
>>>>> reported problems with Nelder-Mead not working properly ...
>>>>>
>>>>>      Ben Bolker
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-mixed-models at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>