[R-sig-ME] lme4/glmer convergence warnings
W Robert Long
longrob604 at gmail.com
Thu Apr 10 23:35:18 CEST 2014
Hi Ben
Thanks for clarifying. I get
> fitted_model at optinfo$derivs$gradient
[1] -0.04070906 -0.41672993 -0.72225653 -32.57950953
-61.30328647 -56.38420303 -0.64343470 -0.34225110 -0.03854723
4847.74510014 0.27245700
> dd <- update(fitted_model,devFunOnly=TRUE)
> params <- unlist(getME(fitted_model,c("theta","beta")))
> grad(dd,params)
[1] -0.04076175 -0.41673067 -0.72230322 -32.88266857
-63.62821306 -56.99385033 -0.64343614 -0.34216063 -0.03854783
-835.66161982 0.27250186
So one large discrepancy, one smaller one, and the rest pretty close.
Please advise.
Thanks again
Rob
On 10/04/2014 21:32, Ben Bolker wrote:
> On 14-04-10 04:28 PM, W Robert Long wrote:
>> Hi Ben
>>
>> Could you clarify what you mean by
>>
>>> params <- getME(fitted_model,"theta")
>>> ## or unlist(getME(fitted_model,c("theta","fixef")) for GLMMs
>>> grad(dd,params)
>>
>> This is a logistic model, so I assume I need to use unlist, however it
>> seems that "fixef" is not a structure in fitted_model because:
>> Error in match.arg(name) :
>> 'arg' should be one of “X”, “Z”, “Zt”, “Ztlist”, “y”, “mu”, “u”, “b”,
>> “Gp”, “Tp”, “L”, “Lambda”, “Lambdat”, “Lind”, “A”, “RX”, “RZX”, “sigma”,
>> “flist”, “beta”, “theta”, “ST”, “REML”, “is_REML”, “n_rtrms”, “n_rfacs”,
>> “cnms”, “devcomp”, “offset”, “lower”
>>
>> Also, grad() does not seem to be an available function.
>>
>> Please advise
>>
>> Thanks
>> Rob
>
> Oops, too much of a hurry.
>
> * We just added 'fixef' as a synonym for 'beta' in getME, but it's only
> on github. unlist(getME(fitted_model,c("theta","beta"))) should work ...
> * grad() is in the numDeriv package -- install/load it first, or use
> numDeriv::grad() if it's installed and you don't feel like loading.
>
> cheers
> Ben Bolker
>
>>
>>
>>
>>
>> On 10/04/2014 20:08, Ben Bolker wrote:
>>> On 14-04-10 04:38 AM, W Robert Long wrote:
>>>> Hi Ben
>>>>
>>>> For my model, I get
>>>>
>>>>> max(abs(relgrad))
>>>> [1] 1.081706
>>>>
>>>> Does this help ?
>>>>
>>>> Thanks
>>>> Rob
>>>>
>>>
>>> Thanks.
>>> Unfortunately it means there's still something we don't understand.
>>> To sum up:
>>>
>>> * we calculate the finite-difference gradients and Hessians with
>>> respect to the model parameters at the estimated MLE (or restricted MLE).
>>>
>>> - these might be poor estimates; we are using our own hand-rolled,
>>> naive central finite difference code rather than the better tested code
>>> in the numDeriv package, which uses the more expensive but more accurate
>>> Richardson method. To see if this is the problem, compare
>>>
>>> fitted_model at optinfo$derivs$gradient
>>>
>>> to
>>>
>>> dd <- update(fitted_model,devFunOnly=TRUE)
>>> params <- getME(fitted_model,"theta")
>>> ## or unlist(getME(fitted_model,c("theta","fixef")) for GLMMs
>>> grad(dd,params)
>>>
>>> and see if they differ significantly. I mostly have *not* seen a
>>> discrepancy in these, although Rune Haubo has reported some
>>> discrepancies (although I think they were in the Hessian rather than the
>>> gradient ... ?)
>>>
>>> It might make sense to compare the Hessian in @optinfo with
>>> hessian(dd,params) from numDeriv as well.
>>>
>>> * we then check those gradients, or the relative gradients from
>>>
>>> with(fitted_model at optinfo$derivs,solve(Hessian,gradient))
>>>
>>> against some tolerance. It makes sense that large gradients at the
>>> estimated parameters (where they should be zero) are a problem, but it
>>> continues to be unclear to me on exactly what scale they should be
>>> small/how we should make this comparison ...
>>>
>>> It might make sense to see if John Nash's optfntools package (on
>>> r-forge) has any useful tools or ideas, although as I recall (vaguely)
>>> those implementations of (e.g.) the KKT criteria
>>> <http://en.wikipedia.org/wiki/Karush%E2%80%93Kuhn%E2%80%93Tucker_conditions>
>>>
>>> were pretty sensitive ...
>>>
>>> * as a crude test of whether the actual value you got is a false
>>> convergence (I know in this case you don't think it is, but including
>>> this for completeness) the things I know to do are (1) try re-fitting
>>> with different optimizers, either starting from the putative best fit or
>>> from scratch (see
>>> <https://github.com/lme4/lme4/blob/master/misc/issues/allFit.R>); (2)
>>> try re-starting the _same_ optimizer from the putative best fit; (3)
>>> explore the likelihood surface (e.g. with profiling or bbmle::slice2D).
>>>
>>> Ben Bolker
>>>
>>>
>>>
>>>
>>>
>>>> On 10/04/2014 03:33, Ben Bolker wrote:
>>>>> Ben Bolker <bbolker at ...> writes:
>>>>>
>>>>>>
>>>>>> On 14-04-06 04:31 AM, Tibor Kiss wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> being somewhat nonplussed by similar messages, I also applied
>>>>>> Ben's recent suggestion to one of my models
>>>>>> to get:
>>>>>>>
>>>>>>> Min. 1st Qu. Median Mean 3rd Qu. Max.
>>>>>>> 1.343e-05 3.530e-05 5.756e-05 7.631e-05 9.841e-05 1.932e-04
>>>>>>>
>>>>>>> So following up on Rob's message: What does it mean?
>>>>>>>
>>>>>>> With kind regards
>>>>>>>
>>>>>>> Tibor
>>>>>>>
>>>>>>
>>>>>> It means that on the scale of the _standard deviations_ of the
>>>>>> parameters, the estimated gradients at the MLE (or restricted MLE) are
>>>>>> not large. I was surprised in Rob's case that these scaled gradients
>>>>>> were not that small; much smaller than without the scaling, but not
>>>>>> small enough to make me think really understand what's going on.
>>>>>>
>>>>>> To recapitulate: the appearance of all of these new messages in
>>>>>> the
>>>>>> latest version of lme4 is **not** due to a degradation or change in
>>>>>> the
>>>>>> optimization or fitting procedure -- it's due to a new set of
>>>>>> convergence tests that we implemented, that we think are giving a
>>>>>> lot of
>>>>>> false positives. You can easily shut them off yourself, or raise the
>>>>>> tolerance for the warnings (see ?lmerControl/?glmerControl). As
>>>>>> developers, we're a bit stuck now because we don't want to turn the
>>>>>> warnings _off_ until we understand the circumstances that are
>>>>>> triggering
>>>>>> them better, and that takes more time and effort than we've been
>>>>>> able to
>>>>>> muster so far.
>>>>>
>>>>> [much context snipped]
>>>>>
>>>>> Just to follow up on this: more technical discussion is going on at
>>>>> https://github.com/lme4/lme4/issues/120 ... at present, it is looking
>>>>> like scaling the gradient by the hessian is going to solve a lot of
>>>>> problems. If you are experiencing convergence warnings about
>>>>> max|grad| that you suspect are false positives, it would be a great
>>>>> help if you could try
>>>>>
>>>>> relgrad <-
>>>>> with(fitted_model at optinfo$derivs,solve(Hessian,gradient))
>>>>> max(abs(relgrad))
>>>>>
>>>>> check if the result is a small number (e.g. <0.001) and report **one
>>>>> way or the other** on this list, or at the Github url above, or
>>>>> (least preferred) by e-mailing lme4-authors at lists.r-forge.r-project.org
>>>>> We also hope that this test *will* pick up the cases where people have
>>>>> reported problems with Nelder-Mead not working properly ...
>>>>>
>>>>> Ben Bolker
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-mixed-models at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
More information about the R-sig-mixed-models
mailing list