[R-sig-ME] c++ exception with logistic glmer
Cole, Tim
tim.cole at ucl.ac.uk
Wed Jan 14 10:37:10 CET 2015
Thanks Ken for your insights, and also for David Duffys thoughts (sent offline):
The reduced model has an undefined intraclass correlation for the binary
maturity variable (every cluster is 0-1), so this is why the variance is
zero. It seems to me, as I mentioned earlier, that if you add additional
time points on either side of the transition to maturity the slope
coefficient for each individual will (arbitrarily) flatten out, which
implies the random intercept model isn't doing what you want it to.
Certainly, the empirical within-individual correlation between
observations is not uniform, which is the simplest RE model.
The message is clear, that this model is not going to work, which is a shame. I would point out though that all the models give reasonable estimates of the median age at maturity, -intercept/slope, which is one of the summary statistics Im interested in.
Tim
---
Tim.Cole at ucl.ac.uk<mailto:Tim.Cole at ich.ucl.ac.uk> Phone +44(0)20 7905 2666 Fax +44(0)20 7905 2381
Population Policy and Practice Programme
UCL Institute of Child Health, London WC1N 1EH, UK
From: Ken Beath <ken.beath at mq.edu.au<mailto:ken.beath at mq.edu.au>>
Date: Thursday, 8 January 2015 22:08
To: Tim Cole <tim.cole at ucl.ac.uk<mailto:tim.cole at ucl.ac.uk>>
Cc: Ben Bolker <bbolker at gmail.com<mailto:bbolker at gmail.com>>, "r-sig-mixed-models at r-project.org<mailto:r-sig-mixed-models at r-project.org>" <r-sig-mixed-models at r-project.org<mailto:r-sig-mixed-models at r-project.org>>
Subject: Re: [R-sig-ME] c++ exception with logistic glmer
The different log likelihoods for different values of nAGQ, and hence BIC, is a known problem with glmer. nAGQ=1 is correct, and I think other values are just out by a constant.
As to your model, what is happening is that a step function is being modeled by a logistic. As a result the coefficient of age tends towards a large number (a parameter estimate of 5 corresponds to an odds ratio of about 150 per year, 17 is a huge odds ratio) , in an attempt to reproduce the step. The random effect will be modeling where the switch happens so will also be large, as it needs to move a very steep line to varying values of age. Effectively the model is close to being non-identifiable, as the loglikelihood will remain almost unchanged for increasing combinations of age parameter and random effect variance. Centering age might improve the numerics and result in even larger coefficients.
I'm not certain what is happening with the 2 observations per subject. If they were all 0,1 then the coefficient of age would be infinite, as a one year change in age would always produce a change from zero to one. However, it is similar to the discrete logistic survival which has a series of zeroes followed by a 1 and doesn't require a random effect to model.
Ken
On 6 January 2015 at 23:34, Cole, Tim <tim.cole at ucl.ac.uk<mailto:tim.cole at ucl.ac.uk>> wrote:
I'm returning to a thread I started a year ago. My logistic glmer model (described below) generated an obscure error message, which Emmanuel Curis pointed out arose from duplicate rows in my data. Ben Bolker and David Duffy thought there might be separation in the cleaned dataset, and Jonathan French and Ben suggested an alternative time-to-event analysis instead of the logistic.
The dataset consists of longitudinal measures of a binary bone maturity score called mat, which for each subject (n = 607) consists of zero or more 0s followed by zero or more 1s, the age at 0/1 transition constituting the subject's age at maturity. The aim is to estimate median age at maturity, given by intercept/age coefficient and confidence interval based on Fieller's theorem.
The full data frame is bhs (4565 rows), while dfs is a reduced data frame (1086 rows) consisting of the last mat 0 entry and first mat 1 entry per subject (where available).
Using glm ignores the longitudinal element, whille glmer allows a random subject intercept reflecting inter-subject variability in age at maturity.
lm0 <- glm(mat ~ age, family=binomial, data=bhs)
lm0r <- glm(mat ~ age, family=binomial, data=dfs)
for (n in c(0, 1, 5, 9)) {
assign(paste('lm0', n, sep='.'), glmer(mat ~ age + (1 | BHID), family=binomial, data=bhs, nAGQ=n))
assign(paste('lm0r', n, sep='.'), glmer(mat ~ age + (1 | BHID), family=binomial, data=dfs, nAGQ=n))
}
list <- ls(pattern='^lm0')
unlist(list, function(z) {
res <- BIC(get(z))
names(res) <- z
res
}))
lm0 lm0.0 lm0.1 lm0.5 lm0.9 lm0r lm0r.0 lm0r.1 lm0r.5 lm0r.9
2474.576 1812.824 1386.288 1788.585 1778.761 1298.951 1305.941 1305.941 1305.941 1305.941
With the full data frame (models lm0.n) BIC varies with nAGQ, and for the Laplace fit (nAGQ=1) the model looks dodgy. The coefficients are as follows:
$lm0
Estimate Std. Error z value Pr(>|z|)
(Intercept) -18.700922 0.5783898 -32.33273 2.426656e-229
age 1.169465 0.0363381 32.18288 3.063925e-227
$lm0.0
Estimate Std. Error z value Pr(>|z|)
(Intercept) -75.261703 3.3652807 -22.36417 8.790724e-111
age 4.710325 0.2086032 22.58031 6.766768e-113
$lm0.1
Estimate Std. Error z value Pr(>|z|)
(Intercept) -276.41591 1.21959198 -226.6462 0
age 17.35977 0.08850143 196.1525 0
$lm0.5
Estimate Std. Error z value Pr(>|z|)
(Intercept) -71.992747 6.7038887 -10.73895 6.679749e-27
age 4.497433 0.4186714 10.74215 6.452210e-27
$lm0.9
Estimate Std. Error z value Pr(>|z|)
(Intercept) -80.623844 8.9423308 -9.015976 1.951243e-19
age 5.035102 0.5569928 9.039797 1.569631e-19
As expected the regression line is steeper with the random effect included, though again lm0.1 look odd.
With the reduced data frame the lmr.n models do not vary with nAGQ. But the odder thing is that the coefficients are identical with and without the random effect, as the SD of the random effect is estimated to be zero.
$lm0r
Estimate Std. Error z value Pr(>|z|)
(Intercept) -8.6820194 0.69870600 -12.42586 1.891928e-35
age 0.5393926 0.04395768 12.27072 1.300837e-34
$lm0r.0
Estimate Std. Error z value Pr(>|z|)
(Intercept) -8.6820193 0.69864489 -12.42694 1.866391e-35
age 0.5393926 0.04395395 12.27177 1.284216e-34
So I have two questions. Why do the results with the full data frame depend on nAGQ, and why with the reduced data frame is the SD of random effect estimated as zero?
Thanks for your thoughts.
Tim
---
Tim.Cole at ucl.ac.uk<mailto:Tim.Cole at ucl.ac.uk><mailto:Tim.Cole at ich.ucl.ac.uk<mailto:Tim.Cole at ich.ucl.ac.uk>> Phone +44(0)20 7905 2666<tel:%2B44%280%2920%207905%202666> Fax +44(0)20 7905 2381<tel:%2B44%280%2920%207905%202381>
Population, Policy and Practice Programme
UCL Institute of Child Health, London WC1N 1EH, UK
--
Ken Beath
Lecturer
Statistics Department
MACQUARIE UNIVERSITY NSW 2109, Australia
Phone: +61 (0)2 9850 8516
Building E4A, room 526
http://stat.mq.edu.au/our_staff/staff_-_alphabetical/staff/beath,_ken/
CRICOS Provider No 00002J
This message is intended for the addressee named and may...{{dropped:10}}
More information about the R-sig-mixed-models
mailing list