[R-sig-ME] Maximal random-effects lmer not converging

Sun Feb 2 14:13:22 CET 2014

Hi Ben, thanks for your message. Here's a sample of the error messages I get:
Warning message:
In (function (fn, par, lower = rep.int(-Inf, n), upper = rep.int(Inf,  :
  failure to converge in 10000 evaluations

(the specific number of evaluations varies; I've tried it with as mugh
as 1,000,000 and still get failure to converge.)

Regarding your second point: yes, is basically my question, I want to
know how to identify the variance handled by a bigger factor when all
I see in the summary is the variances handled b y the dummy-coded
coefficients. For example, say I have a model with one continuous
predictor and one 4-level factor predictor (sample attached), and I
put in random slopes for both of those. If the model doesn't converge,
I need to know whether to remove the random slope for the continuous
predictor or the random slope for the factor. But in the lmer summary,
I will get one variance estimate for the continuous predictor, and
three for the various components of that four-level factor. How do I
know, then, which predictor to not include random slopes for?

Best,
Steve

Stephen Politzer-Ahles
New York University, Abu Dhabi
Neuroscience of Language Lab
http://www.nyu.edu/projects/politzer-ahles/

>
>
> Message: 1
> Date: Fri, 31 Jan 2014 22:20:29 +0000 (UTC)
> From: Ben Bolker <bbolker at gmail.com>
> To: r-sig-mixed-models at r-project.org
> Subject: Re: [R-sig-ME] Maximal random-effects lmer not converging
> Message-ID: <loom.20140131T231314-435 at post.gmane.org>
> Content-Type: text/plain; charset=us-ascii
>
> Stephen Politzer-Ahles <spa268 at ...> writes:
>
> >
> > Hello,
> >
> > I am trying to model a somewhat complicated dataset (which includes a
> > 2x2x4 interaction) with maximal random effects, based on the
> > suggestions from Barr et al. (2013). The maximal model is of course
> > not converging, and there are several things I don't understand about
> > how to proceed.
> >
> > 1. I've seen several suggestions that, when a model fails to converge,
> > you should look at the non-convergent model and then kick out
> > whichever random slope accounted for the least variance. But since my
> > model includes a four-level factor, I get different variances for each
> > level of the factor (and the problem is compounded by the interaction
> > terms, see the snippet below; there are also other random effects for
> > control variables, which I have not shown):
>
>   When you say "not converging", what do you mean exactly?  Are you getting
> warnings, and if so what are they (precisely)?  Or are you stating the
> fact that you're getting estimates of random-effects variances that are
> effectively zero, or estimates of correlations that are +/- 1?
>
> > Random effects:
> >  Groups    Name                       Variance  Std.Dev.  Corr
> > Subject   Factor1a:Factor2a  1.553e-08 1.246e-04
> >            Factor1b:Factor2a  2.000e-08 1.414e-04 0.69
> >            Factor1a:Factor2b   7.322e-09 8.557e-05 0.69 0.99
> >            Factor1b:Factor2b   2.624e-08 1.620e-04 0.55 0.70 0.71
> >            Factor1a:Factor2c   5.017e-08 2.240e-04 0.41 0.65 0.65 0.89
> >            Factor1b:Factor2c   2.220e-08 1.490e-04 0.25 0.48 0.55 0.78
> > 0.90
> >            Factor1a:Factor2d 3.972e-08 1.993e-04 0.50 0.67 0.72 0.93
> > 0.94 0.95
> >            Factor1b:Factor2d 1.642e-08 1.282e-04 0.36 0.79 0.78 0.83
> > 0.81 0.71 0.81
> >
> > So how do I evaluate the amount of variance accounted for by a
> > particular factor (or interaction), in order to determine which ones
> > to remove from the model?
>
>   Well, this is *one* component of the variance structure -- there's no
> way to drop one part of it.  (You can't say "I want to fit an interaction
> among A, B, and C, but I want to drop the B:C term" -- or at least it's
> difficult and unlikely to be sensible).  You could try (Factor1+Factor2|Subject)
> instead of (Factor1:Factor2|subject) -- that would reduce this block from
> an 8x8 variance-covariance matrix (dimension=nlevels(1)*nlevels(2)) to a 5x5
> (nlevels(1)+nlevels(2)-1) variance-covariance matrix, or from 8*9/2=36
> parameters
> to 5*6/2=15 ...
>
> >
> > 2. I am trying to model the random effects structure without
> > correlations, since I'm having a hard time getting convergence. Barr
> > et al. (2013) suggest that if you're not using correlations, then the
> > factors should be coded with deviation coding rather than treatment
> > coding. However, deviation coding does not make theoretical sense for
> > the variables I'm looking at; my design has a 4-level factor, and one
> > of those is a 'baseline' level against which I want to compare the
> > other three (my dependent measure is reaction times, and I want to see
> > which conditions are faster than baseline). So in this case should I
> > estimate the model with deviation coding, and then use post-hoc tests
> > (with some package like glht) later on to compare conditions somehow?
> > Or just go ahead using treatment coding instead of deviation coding?
>
>    Can't help you with this one without spending a lot more time thinking
> about it.  Sorry.  The fundamental problem is that when you force correlations
> to zero, the predictions about what's going on at any particular combination
> of factor levels then depends on the coding -- it is no longer invariant
> to the coding chosen ...
>
> >
> > Thank you,
> > Steve
> >
> > Stephen Politzer-Ahles
> > New York University, Abu Dhabi
> > Neuroscience of Language Lab
> > http://www.nyu.edu/projects/politzer-ahles/
> >
> >
>