[R-sig-ME] c++ exception with logistic glmer
Cole, Tim
tim.cole at ucl.ac.uk
Tue Jan 6 13:34:57 CET 2015
I'm returning to a thread I started a year ago. My logistic glmer model (described below) generated an obscure error message, which Emmanuel Curis pointed out arose from duplicate rows in my data. Ben Bolker and David Duffy thought there might be separation in the cleaned dataset, and Jonathan French and Ben suggested an alternative time-to-event analysis instead of the logistic.
The dataset consists of longitudinal measures of a binary bone maturity score called mat, which for each subject (n = 607) consists of zero or more 0s followed by zero or more 1s, the age at 0/1 transition constituting the subject's age at maturity. The aim is to estimate median age at maturity, given by intercept/age coefficient and confidence interval based on Fieller's theorem.
The full data frame is bhs (4565 rows), while dfs is a reduced data frame (1086 rows) consisting of the last mat 0 entry and first mat 1 entry per subject (where available).
Using glm ignores the longitudinal element, whille glmer allows a random subject intercept reflecting inter-subject variability in age at maturity.
lm0 <- glm(mat ~ age, family=binomial, data=bhs)
lm0r <- glm(mat ~ age, family=binomial, data=dfs)
for (n in c(0, 1, 5, 9)) {
assign(paste('lm0', n, sep='.'), glmer(mat ~ age + (1 | BHID), family=binomial, data=bhs, nAGQ=n))
assign(paste('lm0r', n, sep='.'), glmer(mat ~ age + (1 | BHID), family=binomial, data=dfs, nAGQ=n))
}
list <- ls(pattern='^lm0')
unlist(list, function(z) {
res <- BIC(get(z))
names(res) <- z
res
}))
lm0 lm0.0 lm0.1 lm0.5 lm0.9 lm0r lm0r.0 lm0r.1 lm0r.5 lm0r.9
2474.576 1812.824 1386.288 1788.585 1778.761 1298.951 1305.941 1305.941 1305.941 1305.941
With the full data frame (models lm0.n) BIC varies with nAGQ, and for the Laplace fit (nAGQ=1) the model looks dodgy. The coefficients are as follows:
$lm0
Estimate Std. Error z value Pr(>|z|)
(Intercept) -18.700922 0.5783898 -32.33273 2.426656e-229
age 1.169465 0.0363381 32.18288 3.063925e-227
$lm0.0
Estimate Std. Error z value Pr(>|z|)
(Intercept) -75.261703 3.3652807 -22.36417 8.790724e-111
age 4.710325 0.2086032 22.58031 6.766768e-113
$lm0.1
Estimate Std. Error z value Pr(>|z|)
(Intercept) -276.41591 1.21959198 -226.6462 0
age 17.35977 0.08850143 196.1525 0
$lm0.5
Estimate Std. Error z value Pr(>|z|)
(Intercept) -71.992747 6.7038887 -10.73895 6.679749e-27
age 4.497433 0.4186714 10.74215 6.452210e-27
$lm0.9
Estimate Std. Error z value Pr(>|z|)
(Intercept) -80.623844 8.9423308 -9.015976 1.951243e-19
age 5.035102 0.5569928 9.039797 1.569631e-19
As expected the regression line is steeper with the random effect included, though again lm0.1 look odd.
With the reduced data frame the lmr.n models do not vary with nAGQ. But the odder thing is that the coefficients are identical with and without the random effect, as the SD of the random effect is estimated to be zero.
$lm0r
Estimate Std. Error z value Pr(>|z|)
(Intercept) -8.6820194 0.69870600 -12.42586 1.891928e-35
age 0.5393926 0.04395768 12.27072 1.300837e-34
$lm0r.0
Estimate Std. Error z value Pr(>|z|)
(Intercept) -8.6820193 0.69864489 -12.42694 1.866391e-35
age 0.5393926 0.04395395 12.27177 1.284216e-34
So I have two questions. Why do the results with the full data frame depend on nAGQ, and why with the reduced data frame is the SD of random effect estimated as zero?
Thanks for your thoughts.
Tim
---
Tim.Cole at ucl.ac.uk<mailto:Tim.Cole at ich.ucl.ac.uk> Phone +44(0)20 7905 2666 Fax +44(0)20 7905 2381
Population, Policy and Practice Programme
UCL Institute of Child Health, London WC1N 1EH, UK
Date: Mon, 06 Jan 2014 08:43:21 -0500
From: Ben Bolker <bbolker at gmail.com<mailto:bbolker at gmail.com>>
To: "Cole, Tim" <tim.cole at ucl.ac.uk<mailto:tim.cole at ucl.ac.uk>>,
"r-sig-mixed-models at r-project.org<mailto:r-sig-mixed-models at r-project.org>" <r-sig-mixed-models at r-project.org<mailto:r-sig-mixed-models at r-project.org>>
Subject: Re: [R-sig-ME] c++ exception with logistic glmer
Message-ID: <52CAB2F9.50306 at gmail.com<mailto:52CAB2F9.50306 at gmail.com>>
Content-Type: text/plain; charset=ISO-8859-1
On 14-01-06 04:04 AM, Cole, Tim wrote:
The dataset consists of longitudinal measures of bone score during
puberty, where 1000 indicates maturity. The aim is to estimate median
age at maturity in four groups by sex and ethnicity.
This glm code works fine, but ignores the longitudinal element. lm2
<- glm(I(RUSBoneScore == 1000) ~ log(DecAge) + Sex * Ethnicity,
family=binomial, data=na.omit(bh[, 3:6]))
This code, which adds a random subject effect, fails with the
unhelpful error message below. lm3 <- glmer(I(RUSBoneScore == 1000) ~
log(DecAge) + Sex * Ethnicity + (1 | BHID), family=binomial,
data=na.omit(bh[, 2:6]))
Error in pwrssUpdate(pp, resp, tolPwrss, GHrule(0L), compDev,
verbose) : c++ exception (unknown reason)
I've tried various alternatives but they all fail in the same way.
Thoughts please.
Hard to say without a reproducible example. My main comment is that
this is *not* a problem I have seen anyone report before. sessionInfo()
please? Can you try with verbose=100? Can you make a reproducible
example that is a reasonable size and doesn't have confidentiality problems?
Ben Bolker
Thanks, Tim Cole -- Tim.Cole at ucl.ac.uk<mailto:Tim.Cole at ucl.ac.uk><mailto:Tim.Cole at ich.ucl.ac.uk>
Phone +44(0)20 7905 2666 Fax +44(0)20 7905 2381 Centre for Paediatric
Epidemiology and Biostatistics UCL Institute of Child Health, London
WC1N 1EH, UK
------------------------------
Message: 2
Date: Tue, 7 Jan 2014 02:08:06 +0000 (UTC)
From: Ben Bolker <bbolker at gmail.com<mailto:bbolker at gmail.com>>
To: r-sig-mixed-models at r-project.org<mailto:r-sig-mixed-models at r-project.org>
Subject: Re: [R-sig-ME] Fit negative binomial glmm credible intervals
from the predict function?
Message-ID: <loom.20140107T030759-892 at post.gmane.org<mailto:loom.20140107T030759-892 at post.gmane.org>>
Content-Type: text/plain; charset=us-ascii
Sheryn Olson <sheryn.olson at ...> writes:
Hello Ben and all,
Thank you Ben, very much for your previous reply. Yes, stand/plot makes
sense. After talking with a few statisticians, I got convergence! The
repeated measure issue is solvable with a dummy variable for each of the 3
years (like a fiscal or school year) - so yr1 = 2010smr+2011wtr, and so
on. Pellet plots were sampled 6 times, 3 each season, vegetation only
once. Among year variability can be high.
I'm now attempting to make predictive plots from models. I used UCLA's
method posted at their IRDA site that has a negative binomial example:
http://www.ats.ucla.edu/stat/r/dae/nbreg.htm
newdata2 <- cbind(newdata2, predict(m1, newdata2,type =
"link", se.fit=TRUE))
newdata2 <- within(newdata2, {
DaysAbsent <- exp(fit)
LL <- exp(fit - 1.96 * se.fit)
UL <- exp(fit + 1.96 * se.fit)
})
but I have an uneasy feeling about fitting standard errors from a
gaussian distribution.
In case it helps, what you're assuming is Normal is the *sampling
distribution of the parameters*, not the data themselves ...
The bottom line is that what you're doing seems reasonable, with
the caveats already expressed in ?predict.glmmadmb ...
In the past I've used MCMCglmm to estimate credible intervals around the
beta coefficients of a categorical variable, but now,
I have a continuous variable, conifer sapling count.
Well, technically it's a discrete (count) variable, not continuous ...
Section 2 of the glmmADMB vignette [vignette("glmmADMB")] has
a bit of information about running post-fit MCMC. There should
be an example of how to generate confidence intervals (basically,
you would generate predictions for each sampled set of coefficients
and then compute the quantiles or credible intervals of the
predictions), but there isn't (yet ...)
So, lots of questions!
How does one make an accurate predictive graph from a nbinom model?
Good question. Not easy, you may need to take some shortcuts
and hope for the best.
Does it make sense to set up a dataframe with the means of the
covariates when the covariates'
distributions were skewed count data?
It depends what values you want to predict for. The data frame
typically includes "typical" values of the parameters; you're welcome
to compute predictions for the medians instead if you prefer, or
for a range of values.
How do I use MCMCglmm correctly?
?? Do you mean the 'mcmc' argument of glmmadmb?
Is "se.fit" valid for the nbinom log link, or....?
[Update: I just wrote a bunch of stuff about how se.fit doesn't
work for glmmadmb, when it indeed does! However, I had already
written all the stuff below, which describes more generally how
you would do it by hand, so I'm just going to send it anyway]
The basic recipe for constructing confidence intervals, as laid out
in http://glmm.wikidot.com/faq in the "Predictions and/or confidence
(or prediction) intervals on predictions" section, is
(1) construct the model matrix (X)
* If you want predictions on the original model (not this case),
the model.matrix() accessor may work to extract it from the fit for you
* If not, or if you want to construct predictions for new data,
you need to call model.matrix() with the fixed-effect formula (you
may or may not be able to extract the fixed-effect part of the
formula only from the model, but it's usually fairly easy just
to respecify it
(2) extract the variance-covariance matrix of the fixed effect
predictors, V (usually just vcov(model))
(3) extract the fixed-effect coefficients (usually just fixef(model)
or coef(model))
then the predictions (on the linear predictor scale, at the population
level [i.e. ignoring all random effects]) are
X %*% beta
(plus an offset if there is one)
and the standard errors of prediction, *CONDITIONAL ON THE
RANDOM EFFECTS* [i.e. ignoring all uncertainty due to uncertainty
of the random effects] are
sqrt(diag(X %*% V %*% t(X)))
Then to get confidence intervals you should compute
inverse_link(fit +/- 1.96*se.fit) exactly as you have below
This assumes further that the sampling distribution of the
coefficients is Normal (in which case the linear computation
we did above will also lead to a Normal distribution, on
the linear predictor scale). This might not be true for
small or badly behaved data sets (of course it is never
exactly true except asymptotically ...), but there's not
much you can do about it without working much harder.
my Code:
########## CONIFER SAPLINGS by pellets/ha/month (phm) #####################
#### mod1.all <- glmmadmb(pellets ~ season * (t.con.splgs +
t.dec.trees + pctMidCov + t.BAtrees + cc) +
#### offset(ln.days)+(1|stand/plot)+ (1|hareyr),
data=hv, family="nbinom")
predCS <- data.frame(
t.con.splgs = rep(seq(from = min(hv$t.con.splgs),
to = max(hv$t.con.splgs),length.out = 100), 2),
t.dec.trees=rep(mean(hv$t.dec.trees),200),
pctMidCov=rep(mean(hv$pctMidCov),200),
t.BAtrees=rep(mean(hv$t.BAtrees),200),cc=rep(mean(hv$cc),200),
ln.days=rep(log(30.25),200),
season = factor(rep(1:2, each = 100), levels
= 1:2, labels =levels(hv$season)))
predCS <- cbind(predCS, predict(mod1.all, predCS, type = "link", se.fit=TRUE))
predCS <- within(predCS, {
pellets <- exp(fit)
LL <- exp(fit - 1.96 * se.fit) # these seem wrong !!
UL <- exp(fit + 1.96 * se.fit) # ??
})
Do you mean you're suspicious of the code, or that the numbers
it produces seem wrong?
head(predCS) # check to see fits
t.con.splgs t.dec.trees pctMidCov t.BAtrees cc ln.days season
fit se.fit UL LL pellets
1 0.0000000 3.493585 40.80654 4.117168 82.84937 3.409496 smr
-1.051665 0.5187013 6437.304 842.6536 0.3493557
2 0.7142493 3.493585 40.80654 4.117168 82.84937 3.409496 smr
-1.046329 0.5187194 6471.975 847.1320 0.3512248
3 1.4284985 3.493585 40.80654 4.117168 82.84937 3.409496 smr
-1.040993 0.5187633 6507.163 851.5911 0.3531040
4 2.1427478 3.493585 40.80654 4.117168 82.84937 3.409496 smr
-1.035657 0.5188331 6542.872 856.0304 0.3549932
5 2.8569971 3.493585 40.80654 4.117168 82.84937 3.409496 smr
-1.030321 0.5189286 6579.111 860.4492 0.3568926
6 3.5712464 3.493585 40.80654 4.117168 82.84937 3.409496 smr
-1.024984 0.5190500 6615.885 864.8472 0.3588020
con.splgs.ha phm
1 0.00000 2329.038
2 5.10152 2341.499
3 20.40608 2354.027
4 45.91368 2366.622
5 81.62432 2379.284
6 127.53801 2392.014
############# pellets/ha/month (phm) scale up
############# and scale up saplings to per ha from 0.1ha plot level
predCS <- within(predCS, {
pellets <- exp(fit)
phm <- pellets/1.5*10000
LL <- (exp(fit - 1.96 * se.fit))/1.5*10000
UL <- (exp(fit + 1.96 * se.fit))/1.5*10000
con.splgs.ha <- 10*(t.con.splgs^2) # backtransform square
root and scale up
})
CS <- ggplot(predCS, aes(con.splgs.ha, phm)) +
theme_bw() + #eliminates background, gridlines, but not border
theme(
plot.background = element_blank()
,panel.grid.major = element_blank()
,panel.grid.minor = element_blank()
# ,panel.border = element_blank()
,panel.background = element_blank()
) +
geom_ribbon(aes(ymin = LL, ymax = UL, fill = season),alpha = .25) +
#geom_line(aes(colour = season),size = 1) +
geom_smooth(aes(colour=season, linetype=season),size = 1, se = F) +
scale_colour_manual(values=c("wtr"= 4, "smr" = 3)) +
scale_x_continuous(limits = c(0, 25000)) +
scale_y_continuous(limits = c(0, 12000)) +
scale_fill_brewer(palette="Accent") +
theme(legend.position = 'none') + #get rid of the legend
labs(x = "Conifer Sapling Count per ha", y = "Predicted Pellets/ha/month")
print(CS)
Thanks for any ideas.
Sheryn
------------------------------
_______________________________________________
R-sig-mixed-models mailing list
R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org>
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
End of R-sig-mixed-models Digest, Vol 85, Issue 6
*************************************************
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list