[R-sig-ME] prediction from glmer(..., family = binomial(link = "probit"))
Thierry.ONKELINX at inbo.be
Mon Jan 12 09:57:40 CET 2015
I think that the strong negative parameter estimates are due to the complete separation in your dataset. You might want to try Bayesian solutions like blme or inla.
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
+ 32 2 525 02 51
+ 32 54 43 61 85
Thierry.Onkelinx op inbo.be
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data.
~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
Van: R-sig-mixed-models [mailto:r-sig-mixed-models-bounces op r-project.org] Namens Paul Johnson
Verzonden: vrijdag 9 januari 2015 15:59
Aan: R-mixed models mailing list
Onderwerp: [R-sig-ME] prediction from glmer(..., family = binomial(link = "probit"))
I'm using glmer to fit a binomial GLMM with a probit link to some simulated data, and I'm looking for help in interpreting (back-transforming) the intercept estimate.
The response is logical (is a subject resistant to an infectious disease or not). Subjects are clustered in 700 families, with 3 subjects in each family:
1 1 FALSE
2 1 FALSE
3 1 FALSE
4 2 FALSE
5 2 FALSE
6 2 FALSE
7 3 FALSE
8 3 FALSE
9 3 FALSE
2100 700 FALSE
The prevalence of resistance is low, at 5%, and the degree of familial clustering is moderate, with ICC = 0.25 (this is the ICC of the latent trait liability). The prevalence in the simulated data is 5.3%, close to the parameter value.
I fitted the following model:
fit.glmer <- glmer(resistant ~ 1 + (1 | fam), family = binomial(link = "probit"), data = simdat)
The intercept estimate is -3.79. Back-transforming this gives a very low prevalence estimate of about 0.01% when I marginalise over the random effect:
> predict(fit.glmer, type = "response", re.form = NA)
Averaging over the family-specific predicted prevalences gets closer, but still underestimates the true prevalence (this bias is repeatable, so not just dues to sampling families):
> mean(predict(fit.glmer, type = "response", re.form = NULL))
This does better because back-transforming before taking the mean draws in the predicted prevalence of the families with no resistant subjects, which is very low on the probit scale but bounded by zero on the proportion scale. However I'm pretty sure this is the wrong way to do it.
Perhaps a more likely explanation is that I'm wrong to assume variance = 1 in the inverse probit function? Should I be adding the inter-family variance, additive dispersion (is this fixed at 1?) and the distribution specific variance, as predict.MCMCglmm does (I've fitted the same model in MCMCglmm and it recovers the prevalence and ICC accurately)? However the family variance estimate is so large that including it results in large overestimates, and the fact that predict.merMod doesn't do this suggests it isn't necessary.
Alternatively, could glmer be overestimating the inter-family variance, leading to an underestimate of the prevalence, due to a combination of low prevalence and and small families (n=3; most families have 0/3 resistant, and the maximum is 2/3).
I'm exploring these possibilities but it would be great to have some expert advice.
Thanks in advance,
PS I could post the simulation code but it's rather long so I'll wait and see if there's a simple solution.
R-sig-mixed-models op r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
More information about the R-sig-mixed-models