[R-sig-ME] c++ exception with logistic glmer

Thu Jan 8 23:08:08 CET 2015

The different log likelihoods for different values of nAGQ, and hence BIC,
is a known problem with glmer. nAGQ=1 is correct, and I think other values
are just out by a constant.

As to your model, what is happening is that a step function is being
modeled by a logistic. As a result the coefficient of age tends towards a
large number (a parameter estimate of 5 corresponds to an odds ratio of
about 150 per year, 17 is a huge odds ratio) , in an attempt  to reproduce
the step. The random effect will be modeling where the switch happens so
will also be large, as it needs to move a very steep line to varying values
of age. Effectively the model is close to being non-identifiable, as the
loglikelihood will remain almost unchanged for increasing combinations of
age parameter and random effect variance. Centering age might improve the
numerics and result in even larger coefficients.

I'm not certain what is happening with the 2 observations per subject. If
they were all 0,1 then the coefficient of age would be infinite, as a one
year change in age would always produce a change from zero to one.
However, it is similar to the discrete logistic survival which has a series
of zeroes followed by a 1 and doesn't require a random effect to model.

Ken

On 6 January 2015 at 23:34, Cole, Tim <tim.cole at ucl.ac.uk> wrote:

> I'm returning to a thread I started a year ago. My logistic glmer model
> (described below) generated an obscure error message, which Emmanuel Curis
> pointed out arose from duplicate rows in my data. Ben Bolker and David
> Duffy thought there might be separation in the cleaned dataset, and
> Jonathan French and Ben suggested an alternative time-to-event analysis
> instead of the logistic.
>
> The dataset consists of longitudinal measures of a binary bone maturity
> score called mat, which for each subject (n = 607) consists of zero or more
> 0s followed by zero or more 1s, the age at 0/1 transition constituting the
> subject's age at maturity. The aim is to estimate median age at maturity,
> given by –intercept/age coefficient and confidence interval based on
> Fieller's theorem.
>
> The full data frame is bhs (4565 rows), while dfs is a reduced data frame
> (1086 rows) consisting of the last mat 0 entry and first mat 1 entry per
> subject (where available).
>
> Using glm ignores the longitudinal element, whille glmer allows a random
> subject intercept reflecting inter-subject variability in age at maturity.
>
> lm0 <- glm(mat ~ age, family=binomial, data=bhs)
> lm0r <- glm(mat ~ age, family=binomial, data=dfs)
> for (n in c(0, 1, 5, 9)) {
> assign(paste('lm0', n, sep='.'), glmer(mat ~ age + (1 | BHID),
> family=binomial, data=bhs, nAGQ=n))
> assign(paste('lm0r', n, sep='.'), glmer(mat ~ age + (1 | BHID),
> family=binomial, data=dfs, nAGQ=n))
> }
> list <- ls(pattern='^lm0')
> unlist(list, function(z) {
> res <- BIC(get(z))
> names(res) <- z
> res
> }))
>              lm0    lm0.0    lm0.1    lm0.5    lm0.9     lm0r   lm0r.0
>  lm0r.1   lm0r.5   lm0r.9
> 2474.576 1812.824 1386.288 1788.585 1778.761 1298.951 1305.941 1305.941
> 1305.941 1305.941
>
> With the full data frame (models lm0.n) BIC varies with nAGQ, and for the
> Laplace fit (nAGQ=1) the model looks dodgy. The coefficients are as follows:
>
> $lm0
>               Estimate Std. Error   z value      Pr(>|z|)
> (Intercept) -18.700922  0.5783898 -32.33273 2.426656e-229
> age           1.169465  0.0363381  32.18288 3.063925e-227
>
> $lm0.0
>               Estimate Std. Error   z value      Pr(>|z|)
> (Intercept) -75.261703  3.3652807 -22.36417 8.790724e-111
> age           4.710325  0.2086032  22.58031 6.766768e-113
>
> $lm0.1
>               Estimate Std. Error   z value Pr(>|z|)
> (Intercept) -276.41591 1.21959198 -226.6462        0
> age           17.35977 0.08850143  196.1525        0
>
> $lm0.5
>               Estimate Std. Error   z value     Pr(>|z|)
> (Intercept) -71.992747  6.7038887 -10.73895 6.679749e-27
> age           4.497433  0.4186714  10.74215 6.452210e-27
>
> $lm0.9
>               Estimate Std. Error   z value     Pr(>|z|)
> (Intercept) -80.623844  8.9423308 -9.015976 1.951243e-19
> age           5.035102  0.5569928  9.039797 1.569631e-19
>
> As expected the regression line is steeper with the random effect
> included, though again lm0.1 look odd.
>
> With the reduced data frame the lmr.n models do not vary with nAGQ. But
> the odder thing is that the coefficients are identical with and without the
> random effect, as the SD of the random effect is estimated to be zero.
>
> $lm0r
>               Estimate Std. Error   z value     Pr(>|z|)
> (Intercept) -8.6820194 0.69870600 -12.42586 1.891928e-35
> age          0.5393926 0.04395768  12.27072 1.300837e-34
>
> $lm0r.0
>               Estimate Std. Error   z value     Pr(>|z|)
> (Intercept) -8.6820193 0.69864489 -12.42694 1.866391e-35
> age          0.5393926 0.04395395  12.27177 1.284216e-34
>
> So I have two questions. Why do the results with the full data frame
> depend on nAGQ, and why with the reduced data frame is the SD of random
> effect estimated as zero?
>
> Thanks for your thoughts.
>
> Tim
>  ---
> Tim.Cole at ucl.ac.uk<mailto:Tim.Cole at ich.ucl.ac.uk> Phone +44(0)20 7905 2666
> Fax +44(0)20 7905 2381
> Population, Policy and Practice Programme
> UCL Institute of Child Health, London WC1N 1EH, UK
>
>
>
>

-- 

*Ken Beath*
Lecturer
Statistics Department
MACQUARIE UNIVERSITY NSW 2109, Australia

Phone: +61 (0)2 9850 8516

Building E4A, room 526
http://stat.mq.edu.au/our_staff/staff_-_alphabetical/staff/beath,_ken/

CRICOS Provider No 00002J
This message is intended for the addressee named and may...{{dropped:9}}