[R-sig-ME] GLMM: tricking software to estimate normal error at lower level

Sun Apr 17 18:36:31 CEST 2016

Thanks for the help last week on GLMM estimators.  I'm gradually
figuring this out.

Today I'm writing notes comparing NB with Poisson models. Recall, NB
is Poisson with a log(gamma) error variable added to the linear
predictor.  I've often wondered how different that would be from
adding a normal error, but did not have way to estimate.

In Rabe-Hesketh and Skrondal Multilevel / Longitudinal Modeling Using
Stata, 3ed, they show a way to estimate a normally distributed random
error inserted into a Poisson GLM in a one level model by "tricking"
an estimator for a multilevel model (p. 707). I've checked, can do
with xtpoisson or meglm in Stata 14).  Code is below.

I have some trouble with glmer trying to do same. Estimates are close
to the "seems to work" answer from Stata, but I get glmer convergence
warning and different numbers, grossly different predicted values. I
understand glmer was not intended for this purpose, but wouldn't it be
fun if we could make it work?

Here's the basic plan in Stata that RHS demonstrate. Declare an id
variable with values 1, ..., N. This is non-replicated data, there
will be just one observation per group.

generate id = _n
xtset id
xtpoisson y x1 x2 x3, normal

The Stata xtpoisson defaults the additive random error as log(gamma),
but you can ask for normal. I've confirmed this runs, and it also
works in the newer Stata meglm fitting function (meglm is only for
normal errors).  Then you can compare side by side xtpoisson with
log(gamma), xtpoisson with normal, and meglm with family(poisson) and
normal mixed effects.

I fiddled around with glmer trying to do same, and the results are not
completely different, but I hit convergence errors. Since I don't
understand the fitting process, as I confessed last week, I'm a little
blocked here.

My minimal running example R code:

## Paul Johnson <pauljohn at ku.edu>
## 20160417

library(MASS)
quine$id <- 1:NROW(quine)

library(foreign)
write.dta(quine, file = "quine.dta12")
## Original model in MASS example(glm.nb)
quine.pois1 <- glm(Days ~ Sex/(Age + Eth*Lrn), data = quine, family =
poisson(log))
summary(quine.pois1)
## Don't know how to write Stata formula to compare to that, so
## rewrite without "/"
quine.pois2 <- glm(Days ~ Sex*Age + Sex*Eth*Lrn, family =
poisson(link=log), data = quine)
summary(quine.pois2)

quine.nb1 <- glm.nb(Days ~ Sex*Age + Sex*Eth*Lrn, data = quine)
summary(quine.nb1)
quine$predpois <- predict(quine.pois2, type = "link")
quine$prednb <- predict(quine.nb1, type = "link")

plot(predpois ~ prednb, data = quine)
with(quine, cor(predpois, prednb))

library(lme4)
# Stata uses  nAGQ=14 equivalent:
glmer1 <- glmer(Days ~ Sex*Age + Sex*Eth*Lrn + (1|id), data = quine,
                family = "poisson", nAGQ = 14)
summary(glmer1)
quine$predglmer1 <- predict(glmer1)
plot(predglmer1 ~ prednb, data = quine)

glmer2 <- glmer.nb(Days ~ Sex*Age + Sex*Eth*Lrn + (1|id), data =
quine, family = "poisson",
                control = glmerControl(optCtrl = list(maxfun = 10000)))

glmer3 <- glmer(Days ~ Sex*Age + Sex*Eth*Lrn + (1|id), data = quine,
family = "poisson",
                     control = glmerControl(optimizer = "Nelder_Mead",
optCtrl = list(maxfun = 10000)))
summary(glmer3)

## end of MRE

The output from glmer indicates convergence trouble,

> glmer1 <- glmer(Days ~ Sex*Age + Sex*Eth*Lrn + (1|id), data = quine, family = "poisson")
Warning message:
In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.0294911 (tol = 0.001, component 1)
> summary(glmer1)
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: poisson  ( log )
Formula: Days ~ Sex * Age + Sex * Eth * Lrn + (1 | id)
   Data: quine

     AIC      BIC   logLik deviance df.resid
  1106.5   1151.3   -538.3   1076.5      131

Scaled residuals:
     Min       1Q   Median       3Q      Max
-1.56406 -0.27761  0.00195  0.20574  0.42374

Random effects:
 Groups Name        Variance Std.Dev.
 id     (Intercept) 0.6947   0.8335
Number of obs: 146, groups:  id, 146

Fixed effects:
                 Estimate Std. Error z value Pr(>|z|)
(Intercept)      2.652416   0.318366   8.331  < 2e-16 ***
SexM            -0.333252   0.423300  -0.787 0.431123
AgeF1           -0.571727   0.345721  -1.654 0.098183 .
AgeF2           -0.539010   0.403068  -1.337 0.181134
AgeF3           -0.424210   0.351702  -1.206 0.227754
EthN             0.002994   0.286200   0.010 0.991652
LrnSL            0.834826   0.349273   2.390 0.016840 *
SexM:AgeF1      -0.035285   0.493283  -0.072 0.942976
SexM:AgeF2       1.278388   0.499877   2.557 0.010546 *
SexM:AgeF3       1.614987   0.487992   3.309 0.000935 ***
SexM:EthN       -0.918835   0.397881  -2.309 0.020926 *
SexM:LrnSL      -0.701193   0.499202  -1.405 0.160132
EthN:LrnSL      -1.312972   0.405854  -3.235 0.001216 **
SexM:EthN:LrnSL  2.211797   0.623020   3.550 0.000385 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

I played around with glmerControl a while, no progress....

In Stata, no convergence troubles are reported, estimates are not
entirely different:

Mixed-effects GLM                               Number of obs     =        146
Family:                 Poisson
Link:                       log
Group variable:              id                 Number of groups  =        146

                                                Obs per group:
                                                              min =          1
                                                              avg =        1.0
                                                              max =          1

Integration method: mvaghermite                 Integration pts.  =         14

                                                Wald chi2(13)     =      61.55
Log likelihood = -537.82328                     Prob > chi2       =     0.0000
------------------------------------------------------------------------------
        Days |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         Sex |
          M  |  -.3410261   .4261292    -0.80   0.424    -1.176224    .4941718
         Age |
         F1  |  -.5795131   .3480225    -1.67   0.096    -1.261625    .1025984
         F2  |  -.5478196   .4062568    -1.35   0.178    -1.344068     .248429
         F3  |  -.4316382   .3540887    -1.22   0.223    -1.125639    .2623628
     Sex#Age |
       M#F1  |  -.0280678   .4970979    -0.06   0.955    -1.002362    .9462261
       M#F2  |   1.284848   .5037952     2.55   0.011     .2974278    2.272269
       M#F3  |   1.620156   .4914708     3.30   0.001     .6568905    2.583421
         Eth |
          N  |  -.0025892   .2883566    -0.01   0.993    -.5677578    .5625793
     Sex#Eth |
        M#N  |   -.912496   .4009248    -2.28   0.023    -1.698294   -.1266978
         Lrn |
         SL  |   .8328745   .3519247     2.37   0.018     .1431148    1.522634
     Sex#Lrn |
       M#SL  |  -.6980045    .502695    -1.39   0.165    -1.683269    .2872596
     Eth#Lrn |
       N#SL  |  -1.305221   .4090007    -3.19   0.001    -2.106848   -.5035948
 Sex#Eth#Lrn |
     M#N#SL  |   2.200301   .6276649     3.51   0.000     .9701005    3.430502
       _cons |    2.66012   .3203568     8.30   0.000     2.032233    3.288008
-------------+----------------------------------------------------------------
id           |
   var(_cons)|   .7010502   .1077163                      .5187547    .9474061
------------------------------------------------------------------------------
LR test vs. Poisson model: chibar2(01) = 906.35       Prob >= chibar2 = 0.0000

Here is my Stata code, in case you have Stata 14

* How to use the Stata multilevel random effect
* fitters to estiamte a one level normal
* random effect

capture log close
set more off, permanently
log using glmm-nb.log, replace text

use quine.dta12
glm Days i.Sex##i.Age i.Sex##i.Eth##i.Lrn, family(poisson) link(log)
glm Days i.Sex##i.Age i.Sex##i.Eth##i.Lrn, family(nbinom) link(log)

* The variable id is a case identifier, there is one observation per group

xtset id
xtpoisson Days i.Sex##i.Age i.Sex##i.Eth##i.Lrn, normal
estimates store ran1xt
predict ran1xtpxb, xb
predict ran1xtpmu, nu0

meglm Days i.Sex##i.Age i.Sex##i.Eth##i.Lrn || id: , family(poisson) link(log)
estimates store ran1me
predict ran1mepxb, xb
predict ran1mepmu2, mu conditional(fixedonly)

* Previous 2 models identical predicted values
twoway (scatter ran1xtpxb ran1mepxb)

* Tune up the AGQ int points, waste some CPU
meglm Days i.Sex##i.Age i.Sex##i.Eth##i.Lrn || id: , family(poisson) ///
    link(log) intp(14)
estimates store ran2me
predict ran2mepxb, xb
predict ran2mepmu2, mu conditional(fixedonly)

* 2 xtpoisson equiv to meglm
twoway (scatter ran1xtpxb ran2mepxb)
cor ran1xtpxb ran2mepxb

* same
twoway (scatter ran1xtpxb ran2mepxb)
cor ran1xtpxb ran2mepxb

* Compare to log-gamma random effect
xtpoisson Days i.Sex##i.Age i.Sex##i.Eth##i.Lrn, re
estimates store ran2xt
predict ran2xtlgxb, xb
predict ran2xtlgmu, nu0

twoway (scatter ran2xtlgxb ran2mepxb)
cor ran2xtlgxb ran2mepxb
* Not identical, r = 0.986

Oh, I almost forgot:

> sessionInfo()
R version 3.2.4 Revised (2016-03-16 r70336)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 15.10

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] foreign_0.8-66 lme4_1.1-11    Matrix_1.2-4   MASS_7.3-44

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.3.2      lattice_0.20-33    grid_3.2.4         MatrixModels_0.4-1
 [5] nlme_3.1-126       rockchalk_1.8.102  SparseM_1.7        minqa_1.2.4
 [9] nloptr_1.0.4       car_2.1-0          splines_3.2.4      tools_3.2.4
[13] pbkrtest_0.4-6     parallel_3.2.4     compiler_3.2.4     mgcv_1.8-12
[17] nnet_7.3-12        quantreg_5.19

pj

-- 
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

When Google mail changed their sorting rules, I lost interest in their
services. I only use this account for email list memberships. To write
directly, address me at pauljohn
at ku.edu.