[R-sig-ME] lmer nonconvergent: care to run and explain?

Wed Oct 14 19:02:13 CEST 2015

Every now and then, I come back to my old project that simulates
multi-level data.  I've asked about this from time to time here.

Today I noticed some unfamiliar convergence problems in lme4_1.1-9.  I
updated to lme4-1.1-10 and got new and different problems. Rescaling
variables did solve convergence issues in penultimate version, but not
any more.

Here's my data simulation code. Appears to me this should work.  I'm
giving lots of groups, plenty of observations in each one. Maybe my
random effect variance is too small, but it seems to me that should
not drive convergence problems. I've got Stata output for comparison
below.

Am I setting conditions that make this illconditioned?

library(MASS) ## for rmvnorm
set.seed(1234)

## Clusters
M <- 100
## Minimum per cluster
Mmin <- 134
## Variance in number of members per cluster
Mvar <- 20
N <- Mmin + rpois(M, lambda = Mvar)

Mcolors <- rainbow(M)

## Create Mind and Iind indexes for book keeping.
Mind <- unlist(lapply(1:M, function(x) rep(x, each = N[x])))
Iind <- unlist(lapply(1:M, function(x) seq(1, N[x], by = 1)))

##Create 3 variables from a multivariate Normal with these
## characteristics
Xmeans <- c(100, 200, 150)
Xsds <- c(20, 30, 40)
Xrho12 <- 0.4
Xrho13 <- 0.2
Xrho23 <- 0.0
Xcorr.mat <- diag(1, 3)
Xcorr.mat[1,2] <- Xcorr.mat[2,1] <- Xrho12
Xcorr.mat[1,3] <- Xcorr.mat[3,1] <- Xrho13
Xcorr.mat[2,3] <- Xcorr.mat[3,2] <- Xrho23
if(!isSymmetric(Xcorr.mat))stop("Xcorr.mat is not symmetric")

Xsigma <- diag(Xsds) %*% Xcorr.mat %*% diag(Xsds)
X.mat <- mvrnorm(n = sum(N), mu = Xmeans, Sigma = Xsigma)
dimnames(X.mat)[[2]] <- c("x1", "x2", "x3")

## The true fixed effects of beta0, beta1, beta2, beta3 in
## y = beta0 + beta1*x1 + beta2*x2 + beta3*x3
beta <- c(1.5, 0.25, -0.2, 0.05)

## Standard deviation of error term at individual level
STDE <- 25

## Create a dependent variable that has no clustering effects.
##     FIXED Part                 +  RANDOM PART
y1 <- cbind(1, X.mat) %*% beta + rnorm(sum(N), m = 0, s = STDE)
dat <- cbind(data.frame(Mind, Iind, y1), as.data.frame(X.mat))
rownames(dat) <- paste(dat$Mind, dat$Iind, sep=".") ##may help bookkeeping later
rm(Mind, Iind, y1, X.mat) ## cleanup workspace

summary(lm(y1 ~ x1 + x2 + x3, data=dat))

## Layer on additive group level error in y2 and y3
## Parameters for random effects:
## STDEb0: standard deviation of clustered intercepts.
STDEb0 <- 0.5
## STDEb1: standard deviation of slopes across cluster units
STDEb1 <- 0.25
Mrho <- 0.2
## I'm tempted to get fancy here with a matrix for STDEb0, STDEb1, and
## Mrho. It would pay off, I expect. But be harder to teach.
Msigma <- diag(c(STDEb0, STDEb1)) %*% matrix(c(1, Mrho, Mrho, 1), 2,
2) %*% diag(c(STDEb0, STDEb1))
Mreffects <- mvrnorm(M, mu = c(0, 0), Sigma = Msigma)
colnames(Mreffects) <- c("b0", "b1")

## In y2, include random intercept
dat$y2 <- dat$y1 + Mreffects[ ,"b0"][dat$Mind]

## In y3, add in random slope effect
dat$y3 <- dat$y2 +  dat$x1 * Mreffects[ ,"b1"][dat$Mind]

## Do some diagnostics on Mreffects
plot(Mreffects[, 1], Mreffects[,2], xlab = "Intercept Random Effect",
ylab = "Slope Random Effect",  main = "True Random Effects")
## library(rockchalk)
## summarize(Mreffects)
summary(Mreffects)
apply(Mreffects, 2, sd)
cor(Mreffects)

plot(Mreffects[ ,"b0"][dat$Mind], dat$x1 * Mreffects[
,"b1"][dat$Mind], col=dat$Mind,
     main = "Random Slopes and Intercepts",
     xlab = "Intercept Random Effect (b0)",
     ylab = "Slope Random Effect (b1*x)")

## The "grand" regression?
m1 <- lm(y3 ~ x1 + x2 + x3, data=dat)
summary(m1)

## The "dummy variable" regression?
m2 <- lm(y3 ~ x1 + x2 + x3 + as.factor(Mind), data=dat)
summary(m2)

library(lme4)
## M separate regressions, with 3 predictors
m3list <- lmList(y3 ~ x1 + x2 + x3 | Mind, data = dat, pool = FALSE)

## Predicted values set x2 and x3 at their cluster-specific means.

plot(y3 ~ x1, data = dat, col = Mcolors[Mind], lwd = 0.6, main = "lm
on Separate Clusters")
for( i in seq_along(m3list)){
    m3mf <- model.frame(m3list[[i]]) #data set for group i
    x1range <- range(m3mf$x1) ## use group-specific ranges this time
    pgroup <- predict( m3list[[i]], newdata = data.frame(x1=x1range,
x2=mean(m3mf$x2), x3=mean(m3mf$x3)))
    lines(x1range, pgroup, col=Mcolors[i])
}

## If a cluster has 3 or fewer cases, this warning will appear
## Warning message:
## In predict.lm(m3list[[i]], newdata = data.frame(x1 = x1range, x2 =
mean(m3mf$x2),  :
##   prediction from a rank-deficient fit may be misleading

## Record keeping
m3newdat <- lapply(m3list, function(x) {
    m3mf <- model.frame(x)
    ndf = data.frame(x1 = range(m3mf$x1), x2 = mean(m3mf$x2), x3 =
mean(m3mf$x3))
    ndf$m3pred <- predict(x, newdata = ndf)
    ndf} )
m3newdat <- do.call("rbind", m3newdat)
## Recover the Mind value by butchering the row names
m3newdat$Mind <-  as.integer(gsub("\\.[0-9].*", "", row.names(m3newdat)))

## Draw new graphs on a new device, so we can compare
dev.new()

##
## Estimate this as a mixed-effects model with lme4

## mm2: Just the random intercept, no random slope
mm2 <- lmer(y2 ~ x1 + x2 + x3 + (1 | Mind), data=dat)
summary(mm2)
mm2VarCorr <- VarCorr(mm2)
mm2VarCorr

cor(fitted(mm2), dat$y2)

## Both random intercept and random slope, not correlated with each other
mm3 <- lmer( y3 ~ x1 + x2 + x3 + (1|Mind) + (0 + x1 | Mind), data=dat,
verbose=3)

## That produces a non-convergent model.
## At return
## eval: 479 fn:      153252.65 par: 0.0237800 0.00122944
## Warning messages:
## 1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
##   Model failed to converge with max|grad| = 1.54298 (tol = 0.002,
component 1)
## 2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
##   Model is nearly unidentifiable: very large eigenvalue
##  - Rescale variables?

## Ok, lets scale them
dat[ , c("x1s", "x2s", "x3s")] <- lapply(dat[, c("x1", "x2", "x3")], scale)
dat$y3s <- scale(dat$y3)
mm3s <- lmer( y3s ~ x1s + x2s + x3s + (1 | Mind) + (0 + x1 | Mind),
            data=dat, verbose = 3)

mm4 <- lmer( y3 ~ x1 + x2 + x3 + (x1 | Mind), data = dat, verbose = 2)
summary(mm4)

mm4s <- lmer( y3s ~ x1s + x2s + x3s + (x1s | Mind), data = dat, verbose = 2)
summary(mm4)

Here's what I see at the end.

At return
eval: 453 fn:      34253.131 par: 1.45320e-05 0.00891596
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.00204447 (tol = 0.002,
component 1)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model is nearly unidentifiable: very large eigenvalue
 - Rescale variables?
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 15.04

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] lme4_1.1-10  Matrix_1.2-2 MASS_7.3-44

loaded via a namespace (and not attached):
[1] compiler_3.2.2  minqa_1.2.4     tools_3.2.2     Rcpp_0.12.1
[5] splines_3.2.2   nlme_3.1-122    grid_3.2.2      nloptr_1.0.4
[9] lattice_0.20-33

Earlier today, with the previous lme4, the scaling of the variables
produced a convergent model.

Upon exporting the data to Stata 14, we observe:

. mixed y1 x1 x2 x3 || Mind: x1, cov(unstruct)

Performing EM optimization:

Performing gradient-based optimization:

Iteration 0:   log likelihood = -71367.808
Iteration 1:   log likelihood = -71366.333
Iteration 2:   log likelihood = -71366.311
Iteration 3:   log likelihood = -71366.308
Iteration 4:   log likelihood = -71366.304
Iteration 5:   log likelihood = -71366.304

Computing standard errors:

Mixed-effects ML regression                     Number of obs      =     15419
Group variable: Mind                            Number of groups   =       100

                                                Obs per group: min =       146
                                                               avg =     154.2
                                                               max =       169

                                                Wald chi2(3)       =   1095.11
Log likelihood = -71366.304                     Prob > chi2        =    0.0000

------------------------------------------------------------------------------
          y1 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |   .2534893   .0110537    22.93   0.000     .2318245    .2751541
          x2 |  -.1914348   .0072503   -26.40   0.000    -.2056452   -.1772245
          x3 |    .050516   .0051398     9.83   0.000     .0404421    .0605899
       _cons |  -.8820319   1.575815    -0.56   0.576    -3.970573    2.206509
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
Mind: Unstructured           |
                     var(x1) |   7.47e-07          .             .           .
                  var(_cons) |   .2079643          .             .           .
               cov(x1,_cons) |  -.0003941          .             .           .
-----------------------------+------------------------------------------------
               var(Residual) |   613.2876          .             .           .
------------------------------------------------------------------------------
LR test vs. linear regression:       chi2(3) =     0.06   Prob > chi2 = 0.9964

Note: LR test is conservative and provided only for reference.

Fixed parameter estimates are good, and the residual variance is not
far wrong.  The estimated variance of the slope random effect is
miniscule. But there's no convergence error.

Maybe I'm not generating the random effect for the regression on x1
correctly.  In the simulated data,

> rockchalk::summarize(Mreffects)
$numerics
     Mreffects_1 Mreffects_2
0%       -1.4122     -0.8380
25%      -0.3799     -0.1887
50%      -0.0438     -0.0244
75%       0.2899      0.0976
100%      0.9951      0.5505
mean     -0.0615     -0.0495
sd        0.4650      0.2455
var       0.2162      0.0602
NA's      0.0000      0.0000
N       100.0000    100.0000

$factors
NULL

> cor(Mreffects)
          b0        b1
b0 1.0000000 0.1445065
b1 0.1445065 1.0000000

-- 
Paul E. Johnson
Professor, Political Science        Director
1541 Lilac Lane, Room 504      Center for Research Methods
University of Kansas                 University of Kansas
http://pj.freefaculty.org              http://crmda.ku.edu