[R-sig-ME] Problems with convergence

Mon Mar 30 22:21:20 CEST 2015

Ken Beath <ken.beath at ...> writes:

> Yes, I was demonstrating that it fails convergence and then as a
> consequence fails to profile. I have my doubts about 
> convergence for the bobyqa
> algorithm, I have other applications where it doesn't converge properly.
> For some of my own work I've used nlminb followed by Nelder-Mead if there
> is a convergence failure. Not optimal but it seems to work.

   I'm still not sure whether you expect it to converge (I think you
do), or whether you are just pointing out that the convergence warning
in this case is probably justified (in the face of so many convergence
warnings that turn out to be false positives, this is a useful piece
of information).

> While it is fairly heavily parameterised it is a real model, 
>  a frequentist
> implementation of Smith, T. C., Spiegelhalter, D. J., & Thomas, a. (1995).
> Bayesian approaches to random-effects meta-analysis: a comparative study.
> Statistics in Medicine, 14(24), 2685–99. The reason for having studies as
> fixed effects is probably philosophical, the overall success rates are not
> likely to be given by normally distributed random effects, and are in many
> cases specifically chosen.

  I can appreciate that, but I still think it's unrealistic to expect
to be able to fit 22 parameters to 40 observations except under very
special circumstances.  One point about switching from the Bayesian
to the frequentist world is that the Bayesians (by definition) put
priors on their parameters, which provides a degree of regularization
that is not by default available to frequentist methods.  What priors
did Smith et al. use?  It might be worth trying this in blme with
priors on the fixed effects ...

> I did find that one of the data sets that I have also failed, but fitted
> with a commercial program that is based on the EM algorithm. For this type
> of problem it is actually faster, as any type of quasi-Newton needs to
> calculate lots of derivatives.

  I could whine about the difficulty of finding globally robust,
reliable, and fast optimization algorithms, but I won't.  I can certainly
appreciate that there are more reliable methods for particular
sub-classes of problems.

> Anyway, I'm going to keep looking at the methods, and eventually the code
> for glmer and may eventually have some suggestions.

  Would be happy to hear them.

  It's worth pointing out that lme4 is using a preliminary "nAGQ=0"
step, which ignores the terms contributed by the integrals over the
distributions of the conditional modes and as a result is able to
fit both the fixed-effect parameters and the conditional modes in
a single linear-algebra step, reducing the dimensionality of the
nonlinear optimization to the length of the variance-covariance
parameter vector ...

> On 19 March 2015 at 14:45, Ben Bolker <bbolker <at> gmail.com> wrote:
> 
> > Ken Beath <ken.beath <at> ...> writes:
> >
> > > The following code shows that there are convergence problem messages
> > > where there is a problem with convergence. The profiling shows that
> > > the maximum found is not the correct one. This is simulated data for
> > > a binary meta-analysis with fixed effect for study and random effect
> > > for treatment.
> >

 [paragraph snipped to try to make Gmane happy]

> >   However, may I comment that this is a slightly ridiculous scenario?
> > The data set here has 40 observations, and the model tries to fit 22
> > parameters.  The model that treats id as a random effect works much
> > better.  I can believe there are scenarios where you really do
> > want study as a fixed effect, but did you expect it to be practical
> > here?
> >
> > But maybe you're just trying to show that this is a "true positive"
> > case for the convergence warnings.
> >
> > Some random code I wrote while diagnosing what was going on:
> >
> > library(ggplot2); theme_set(theme_bw())
> >
> > ## proportion + weights is a little easier to handle
> > thedata <- transform(thedata,prop=nEvents/total)
> >
> > ggplot(thedata,aes(trt,prop))+geom_point(aes(size=total))+
> >     geom_line(aes(group=id),colour="gray")
> > glmer1 <- glmer(prop~trt+factor(id)+(0+trt|id),
> >                 weights=total,data=thedata,family=binomial)
> >
> > ## id as RE
> > glmer2 <- glmer(prop~trt+(1|id)+(0+trt|id),
> >                 weights=total,data=thedata,family=binomial)
> >
> > dd <- update(glmer1,devFunOnly=TRUE)
> > pars <- unlist(getME(glmer1,c("theta","fixef")))
> > library("bbmle")
> > ss <- slice2D(pars,dd)
> > library("lattice")
> > plot(ss)
> > ## too complex, but too much work to cut down significantly
> >
> > > library(lme4)
> > >
> > > thedata <- structure(list(nEvents=c(10L,53L,17L,18L,22L,6L,16L,
> > > 14L,13L,18L,15L,19L,52L,19L,8L,16L,50L,8L,9L,4L,
> > > 26L,45L,18L,20L,5L,16L,18L,7L,3L,19L,30L,26L,66L,
> > > 23L,29L,18L,72L,25L,9L,2L),total=c(200,200,200,200,
> > > 200,200,200,200,200,200,200,200,200,200,200,200,200,
> > > 200,200,200,200,200,200,200,200,200,200,200,200,200,
> > > 200,200,200,200,200,200,200,200,200,200),trt=c(0,
> > > 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,
> > > 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1),id=structure(c(1L,
> > > 2L,3L,4L,5L,6L,7L,8L,9L,10L,11L,12L,13L,14L,15L,
> > > 16L,17L,18L,19L,20L,1L,2L,3L,4L,5L,6L,7L,8L,9L,
> > > 10L,11L,12L,13L,14L,15L,16L,17L,18L,19L,20L),.Label=c("1",
> > > "2","3","4","5","6","7","8","9","10","11","12","13",
> > > "14","15","16","17","18","19","20"),class="factor")),.Names=c("nEvents",
> > > "total","trt","id"),row.names=c(NA,40L),class="data.frame")
> > >
> > > glmer1<-glmer(cbind(nEvents,total-nEvents)~trt+factor(id)+
> > ##   (0+trt|id),data=thedata,family=binomial)
> > >
> > > # while glmer has problems with component 9 it is 8 with a problem
> > profile
> > > # I've use devtol so the discrepancy is printed
> > > prof.glmer1<-profile(glmer1,which=8,devtol=1.0e-3)