[R-sig-ME] Convergence problem example

Thu Apr 26 02:17:41 CEST 2018

Dear Ben

I understand. Help us think of ways we can help you.

I wonder why you don't take the easy route.  Have glmer do the
convergence test. If failed, then change the optimizer to bobyqa, and
IF that silences the convergence warning, report that result? That's
what I'd do, if I knew how :)

Dear everybody else

Lets see if we can help!

Questions/Ideas

1. How often does this come up?

Casual googling indicates the problem was widespread 2 years ago,
maybe not so many posts about it now.

Do you think so?

2.  What do you think about making a survey for lme4 users to find out
how often convergence warnings happen?  We could make a place to
attach files that have code and R objects.

If that survey existed, could we insert its address into the lme4
convergence warning along with instructions on what to do.  I'm
thinking something simple like "Help with diagnostics.  Run this:

saveRDS(the_data_frame, file = "your-last-name-20180425.rds")

and then paste in the trouble-causing function call into the following box...

Some cases have private data, I understand, but I doubt that all of them do.

3. Should we archive & categorize the reproducible examples?

What do you say if we (I) make a GitHub project for convergence
failure examples?  Self contained examples. Maybe the one I had today
is example 1. I suppose we need maybe  20 or 30 more, with a variety
of different warning messages.

I wish I had an example where the convergence diagnostic is correct
and the problem is not solved in a superficial way.  I'd be especially
enthusiastic if Stata or SAS don't notice and report non-converged
results on same model.

I found a post that hits the high points for me:
https://rstudio-pubs-static.s3.amazonaws.com/33653_57fc7b8e5d484c909b615d8633c01d51.html
(I wish the author would put his/her name in it!)

pj

On Wed, Apr 25, 2018 at 1:47 PM, Ben Bolker <bbolker at gmail.com> wrote:
>
>   Just a quick comment.
>
>   As long-time readers of this forum may know (I think I've said it here
> before), the convergence warnings provided by lme4 are often
> over-sensitive.  They're based on *post hoc* attempts (after the
> nonlinear optimizer thinks it has converged) to evaluate the KKT
> criteria (in this case reducing to zero gradient, negative
> positive-definite Hessian) by computing the finite-difference
> approximations of the gradient and Hessian of the negative
> log-likelihood.  The problem is that this computation is just as subject
> to numeric error as the nonlinear optimization itself, so it doesn't
> work very well. (Some of this is stated in ?lme4::convergence)
>
>   Doug Bates feels that attempts to fix the problem are
> "toast-scraping", i.e. attempts to fix something that was a bad idea in
> the first place (i.e., I/we should throw away the metaphorical burned
> toast and scrap the post-hoc convergence tests, relying on the optimizer
> to tell us if it thinks it has converged successfully). The Catch-22 is
> that (1) I'm afraid to throw a system that might catch a reasonable
> fraction of truly bad convergence cases; (2) I don't want to flip-flop
> (take the tests out, then decide that they were a good idea after all
> and reintroduce them); (3) making an *informed* decision what to do
> (e.g. throwing out the tests or setting a more appropriate threshold)
> would require a lot of work generating test cases -- an in order to get
> a really good handle on the problem, we'd have to generate not just
> "nice" test cases but all kinds of pathological examples where there
> really are convergence problems.
>
>   Besides being hard and tedious, this is time I don't really have.
> Volunteers ... ?
>
>   cheers
>    Ben Bolker
>
> On 2018-04-25 11:12 AM, Phillip Alday wrote:
>> Changing the optimizer seems to address the convergence warnings and has
>> no impact on the other aspects of the fit:
>>
>>> m1 <- update(m1, control=glmerControl(optimizer="bobyqa"))
>>> summary(m1)
>> Generalized linear mixed model fit by maximum likelihood (Adaptive
>>   Gauss-Hermite Quadrature, nAGQ = 30) [glmerMod]
>>  Family: binomial  ( logit )
>> Formula: fscr ~ lncfs + ai + heifer + (1 | cow)
>>    Data: dairy
>> Control: glmerControl(optimizer = "bobyqa")
>>
>>      AIC      BIC   logLik deviance df.resid
>>   4017.8   4047.9  -2003.9   4007.8     3022
>>
>> Scaled residuals:
>>     Min      1Q  Median      3Q     Max
>> -1.8136 -0.7652 -0.6479  1.0420  1.6213
>>
>> Random effects:
>>  Groups Name        Variance Std.Dev.
>>  cow    (Intercept) 0.3419   0.5847
>> Number of obs: 3027, groups:  cow, 1575
>>
>> Fixed effects:
>>                   Estimate Std. Error z value Pr(>|z|)
>> (Intercept)       -1.55696    0.43481  -3.581 0.000343 ***
>> lncfs              0.52222    0.10007   5.218  1.8e-07 ***
>> aiai              -1.09560    0.12341  -8.878  < 2e-16 ***
>> heiferprimiparous -0.08259    0.09718  -0.850 0.395418
>> ---
>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>
>> Correlation of Fixed Effects:
>>             (Intr) lncfs  aiai
>> lncfs       -0.965
>> aiai        -0.193 -0.046
>> heifrprmprs -0.054  0.016 -0.051
>>
>> Phillip
>>
>>
>>
>> On 04/25/2018 04:56 PM, Paul Johnson wrote:
>>> In the book Multilevel and Longitudinal Modeling using Stata,
>>> Rabe-Hesketh and Skrondal have a lot of exercises and over the years
>>> I've been trying to write Stata and R code to demonstrate
>>> similarities/differences.
>>>
>>> I've run into an example where Stata (either with builtin methods or
>>> the addon gllamm) seems to think it gets estimates but lme4
>>> diagnostics say there is a convergence failure.  I want to impose on
>>> you, get your advice about it. We see convergence warnings quite often
>>> with lme4, but they are usually the "rescale your variables" errors,
>>> not as blunt as this.
>>>
>>> The first part retrieves "dairy.dta" from the book website
>>>
>>> library(foreign)
>>> library(lme4)
>>>
>>> fn <- "dairy"
>>> if (!file.exists(paste0(fn, ".dta12"))) {
>>>     download.file(paste0("http://www.stata-press.com/data/mlmus3/", fn, ".dta"),
>>>                   destfile = paste0(fn, ".dta12"))
>>> }
>>>
>>> dairy <- read.dta(paste0(fn, ".dta12"))
>>>
>>> #1
>>> m1 <- glmer(fscr ~ lncfs + ai + heifer + (1 | cow),
>>>             family = binomial(link = "logit"), nAGQ = 30,
>>>             data = dairy)
>>> summary(m1)
>>>
>>> From that, I see:
>>>> m1 <- glmer(fscr ~ lncfs + ai + heifer + (1 | cow),
>>> +             family = binomial(link = "logit"), nAGQ = 30,
>>> +             data = dairy)
>>> Warning messages:
>>> 1: 'rBind' is deprecated.
>>>  Since R version 3.2.0, base's rbind() should work fine with S4 objects
>>> 2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
>>>   Model failed to converge with max|grad| = 0.00396932 (tol = 0.001,
>>> component 1)
>>>> summary(m1)
>>> Generalized linear mixed model fit by maximum likelihood (Adaptive
>>>   Gauss-Hermite Quadrature, nAGQ = 30) [glmerMod]
>>>  Family: binomial  ( logit )
>>> Formula: fscr ~ lncfs + ai + heifer + (1 | cow)
>>>    Data: dairy
>>>
>>>      AIC      BIC   logLik deviance df.resid
>>>   4017.8   4047.9  -2003.9   4007.8     3022
>>>
>>> Scaled residuals:
>>>     Min      1Q  Median      3Q     Max
>>> -1.8136 -0.7652 -0.6479  1.0420  1.6213
>>>
>>> Random effects:
>>>  Groups Name        Variance Std.Dev.
>>>  cow    (Intercept) 0.3419   0.5847
>>> Number of obs: 3027, groups:  cow, 1575
>>>
>>> Fixed effects:
>>>                   Estimate Std. Error z value Pr(>|z|)
>>> (Intercept)       -1.55693    0.43481  -3.581 0.000343 ***
>>> lncfs              0.52221    0.10007   5.218 1.81e-07 ***
>>> aiai              -1.09560    0.12341  -8.878  < 2e-16 ***
>>> heiferprimiparous -0.08259    0.09718  -0.850 0.395410
>>> ---
>>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>>
>>> Correlation of Fixed Effects:
>>>             (Intr) lncfs  aiai
>>> lncfs       -0.965
>>> aiai        -0.193 -0.046
>>> heifrprmprs -0.054  0.016 -0.051
>>> convergence code: 0
>>> Model failed to converge with max|grad| = 0.00396932 (tol = 0.001, component 1)
>>>
>>>
>>> Stata output for comparison purposes, using gllamm:
>>>
>>> . gllamm fscr lncfs ai heifer, i(cow) link(logit) fam(binom) adapt
>>>
>>> Running adaptive quadrature
>>> Iteration 0:    log likelihood = -2004.6011
>>> Iteration 1:    log likelihood = -2003.9085
>>> Iteration 2:    log likelihood = -2003.9069
>>>
>>>
>>> Adaptive quadrature has converged, running Newton-Raphson
>>> Iteration 0:   log likelihood = -2003.9069
>>> Iteration 1:   log likelihood = -2003.9069
>>> Iteration 2:   log likelihood = -2003.9064
>>> Iteration 3:   log likelihood = -2003.9064
>>>
>>> number of level 1 units = 3027
>>> number of level 2 units = 1575
>>>
>>> Condition Number = 47.123877
>>>
>>> gllamm model
>>>
>>> log likelihood = -2003.9064
>>>
>>> ------------------------------------------------------------------------------
>>>         fscr |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
>>> -------------+----------------------------------------------------------------
>>>        lncfs |   .5222157   .1000693     5.22   0.000     .3260834     .718348
>>>           ai |  -1.095598   .1234099    -8.88   0.000    -1.337477   -.8537193
>>>       heifer |  -.0825878   .0971811    -0.85   0.395    -.2730593    .1078837
>>>        _cons |  -1.556961   .4348008    -3.58   0.000    -2.409155   -.7047673
>>> ------------------------------------------------------------------------------
>>>
>>>
>>> Variances and covariances of random effects
>>> ------------------------------------------------------------------------------
>>>
>>>
>>> ***level 2 (cow)
>>>
>>>     var(1): .34188062 (.1263136)
>>> ------------------------------------------------------------------------------
>>>
>>>
>>> Using the newer meglm that is provided with Stata
>>>
>>> . meglm fscr lncfs ai heifer || cow: , family(binom) link(logit)
>>>
>>> Fitting fixed-effects model:
>>>
>>> Iteration 0:   log likelihood = -2011.8253
>>> Iteration 1:   log likelihood = -2009.1421
>>> Iteration 2:   log likelihood = -2009.1412
>>> Iteration 3:   log likelihood = -2009.1412
>>>
>>> Refining starting values:
>>>
>>> Grid node 0:   log likelihood = -2015.6021
>>>
>>> Fitting full model:
>>>
>>> Iteration 0:   log likelihood = -2015.6021
>>> Iteration 1:   log likelihood =  -2006.709
>>> Iteration 2:   log likelihood = -2003.9174
>>> Iteration 3:   log likelihood = -2003.9065
>>> Iteration 4:   log likelihood = -2003.9065
>>>
>>> Mixed-effects GLM                               Number of obs     =      3,027
>>> Family:                binomial
>>> Link:                     logit
>>> Group variable:             cow                 Number of groups  =      1,575
>>>
>>>                                                 Obs per group:
>>>                                                               min =          1
>>>                                                               avg =        1.9
>>>                                                               max =          5
>>>
>>> Integration method: mvaghermite                 Integration pts.  =          7
>>>
>>>                                                 Wald chi2(3)      =     103.86
>>> Log likelihood = -2003.9065                     Prob > chi2       =     0.0000
>>> ------------------------------------------------------------------------------
>>>         fscr |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
>>> -------------+----------------------------------------------------------------
>>>        lncfs |   .5222154   .1000694     5.22   0.000      .326083    .7183479
>>>           ai |  -1.095597   .1234095    -8.88   0.000    -1.337476   -.8537191
>>>       heifer |  -.0825878   .0971811    -0.85   0.395    -.2730593    .1078836
>>>        _cons |  -1.556961   .4348012    -3.58   0.000    -2.409155   -.7047659
>>> -------------+----------------------------------------------------------------
>>> cow          |
>>>    var(_cons)|   .3418776   .1263105                      .1657237    .7052721
>>> ------------------------------------------------------------------------------
>>> LR test vs. logistic model: chibar2(01) = 10.47       Prob >= chibar2 = 0.0006
>>>
>>>
>>> As far as I can see, parameter estimates are the same. I did not
>>> compare conditional modes.
>>>
>>> If lme4's converge warning is valid, then we have a concrete case
>>> where Stata is reporting the wrong thing.  I can see some upside there
>>> :)
>>>
>>> R details
>>>
>>>> sessionInfo()
>>> R version 3.4.4 (2018-03-15)
>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>> Running under: Ubuntu 17.10
>>>
>>> Matrix products: default
>>> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
>>> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
>>>
>>> locale:
>>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] lme4_1.1-15    Matrix_1.2-14  foreign_0.8-69
>>>
>>> loaded via a namespace (and not attached):
>>>  [1] minqa_1.2.4     MASS_7.3-49     compiler_3.4.4  tools_3.4.4
>>>  [5] Rcpp_0.12.15    splines_3.4.4   nlme_3.1-137    grid_3.4.4
>>>  [9] nloptr_1.0.4    lattice_0.20-35
>>>
>>>
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

To write to me directly, please address me at pauljohn at ku.edu.