[R-sig-ME] Predictions from zero-inflated or hurdle models
Ruben Arslan
rubenarslan at gmail.com
Tue Mar 10 13:22:26 CET 2015
Dear Dr Bolker,
I'd thought about something like this, one point of asking was to see whether
a) it's implemented already, because I'll probably make dumb mistakes while
trying b) it's not implemented because it's a bad idea.
Your response and the MCMCglmm course notes make me hope that it's c) not
implemented because nobody did yet or d) it's so simple that everybody does
it on-the-fly.
So I tried my hand and would appreciate corrections. I am sure there is
some screw-up or an inelegant approach in there.
I included code for dealing with mcmc.lists because that's what I have and
I'm not entirely sure how I deal with them is correct either.
I started with a zero-altered model, because those fit fastest and
according to the course notes have the least complex likelihood.
Because I know not what I do, I'm not dealing with my random effects at all.
I pasted a model summary below to show what I've applied the below function
to. The function gives the following quantiles when applied to 19 chains of
that model.
5% 50% 95%
5.431684178 5.561211207 5.690655200
5% 50% 95%
5.003974382 5.178192327 5.348246558
Warm regards,
Ruben Arslan
HPDpredict_za = function(object, predictor) {
if(class(object) != "MCMCglmm") {
if(length( object[[1]]$Residual$nrt )>1) {
object = lapply(object,FUN=function(x) { x$Residual$nrt<-2;x })
}
Sol = mcmc.list(lapply(object,FUN=function(x) { x$Sol}))
vars = colnames(Sol[[1]])
} else {
Sol = as.data.frame(object$Sol)
vars = names(Sol)
}
za_predictor = vars[ vars %ends_with% predictor & vars %begins_with%
"traitza_"]
za_intercept_name = vars[ ! vars %contains% ":" & vars %begins_with%
"traitza_"]
intercept = Sol[,"(Intercept)"]
za_intercept = Sol[, za_intercept_name]
l1 = Sol[, predictor ]
l2 = Sol[, za_predictor ]
if(is.list(object)) {
intercept = unlist(intercept)
za_intercept = unlist(za_intercept)
l1 = unlist(l1)
l2 = unlist(l2)
}
py_0 = dpois(0, exp(intercept + za_intercept))
y_ygt0 = exp(intercept)
at_intercept = (1-py_0) * y_ygt0
py_0 = dpois(0, exp(intercept + za_intercept + l2))
y_ygt0 = exp(intercept + l1)
at_predictor_1 = (1-py_0) * y_ygt0
print(qplot(at_intercept))
print(qplot(at_predictor_1))
df = data.frame("intercept" = at_intercept)
df[, predictor] = at_predictor_1
print(qplot(x=variable,
y=value,data=suppressMessages(melt(df)),fill=variable,alpha=I(0.40), geom =
'violin'))
print(quantile(at_intercept, probs = c(0.05,0.5,0.95)))
print(quantile(at_predictor_1, probs = c(0.05,0.5,0.95)))
invisible(df)
}
> summary(object[[1]])
Iterations = 100001:299901
Thinning interval = 100
Sample size = 2000
DIC: 349094
G-structure: ~idh(trait):idParents
post.mean l-95% CI u-95% CI eff.samp
children.idParents 0.0189 0.0164 0.0214 1729
za_children.idParents 0.2392 0.2171 0.2622 1647
R-structure: ~idh(trait):units
post.mean l-95% CI u-95% CI eff.samp
children.units 0.144 0.139 0.148 1715
za_children.units 1.000 1.000 1.000 0
Location effects: children ~ trait * (maternalage.factor + paternalloss +
maternalloss + center(nr.siblings) + birth.cohort + urban + male +
paternalage.mean + paternalage.diff)
post.mean l-95% CI u-95% CI
eff.samp pMCMC
(Intercept) 2.088717 2.073009 2.103357
2000 <0.0005 ***
traitza_children -1.933491 -1.981945 -1.887863
2000 <0.0005 ***
maternalage.factor(14,20] 0.007709 -0.014238 0.027883
1500 0.460
maternalage.factor(35,50] 0.006350 -0.009634 0.024107
2000 0.462
paternallossTRUE 0.000797 -0.022716 0.025015
2000 0.925
maternallossTRUE -0.015542 -0.040240 0.009549
2000 0.226
center(nr.siblings) 0.005869 0.004302 0.007510
2000 <0.0005 ***
birth.cohort(1703,1722] -0.045487 -0.062240 -0.028965
2000 <0.0005 ***
birth.cohort(1722,1734] -0.055872 -0.072856 -0.036452
2000 <0.0005 ***
birth.cohort(1734,1743] -0.039770 -0.056580 -0.020907
2000 <0.0005 ***
birth.cohort(1743,1750] -0.030713 -0.048301 -0.012214
2000 0.002 **
urban -0.076748 -0.093240 -0.063002
2567 <0.0005 ***
male 0.106074 0.095705 0.115742
2000 <0.0005 ***
paternalage.mean -0.024119 -0.033133 -0.014444
2000 <0.0005 ***
paternalage.diff -0.018367 -0.032083 -0.005721
2000 0.007 **
traitza_children:maternalage.factor(14,20] -0.116510 -0.182432 -0.051978
1876 0.001 ***
traitza_children:maternalage.factor(35,50] -0.045196 -0.094485 0.002640
2000 0.075 .
traitza_children:paternallossTRUE -0.171957 -0.238218 -0.104820
2000 <0.0005 ***
traitza_children:maternallossTRUE -0.499539 -0.566825 -0.430637
2000 <0.0005 ***
traitza_children:center(nr.siblings) -0.023723 -0.028676 -0.018746
1848 <0.0005 ***
traitza_children:birth.cohort(1703,1722] -0.026012 -0.074250 0.026024
2000 0.319
traitza_children:birth.cohort(1722,1734] -0.279418 -0.329462 -0.227187
2000 <0.0005 ***
traitza_children:birth.cohort(1734,1743] -0.260165 -0.312659 -0.204462
2130 <0.0005 ***
traitza_children:birth.cohort(1743,1750] -0.481457 -0.534568 -0.426648
2000 <0.0005 ***
traitza_children:urban -0.604108 -0.645169 -0.562554
1702 <0.0005 ***
traitza_children:male -0.414988 -0.444589 -0.387005
2000 <0.0005 ***
traitza_children:paternalage.mean 0.006545 -0.018570 0.036227
2000 0.651
traitza_children:paternalage.diff -0.097982 -0.136302 -0.060677
2000 <0.0005 ***
On Mon, Mar 9, 2015 at 10:12 PM Ben Bolker <bbolker at gmail.com> wrote:
> Ruben Arslan <rubenarslan at ...> writes:
>
> >
> > Dear list,
> >
> > I wanted to ask: Is there any (maybe just back of the envelope) way to
> > obtain a response prediction for zero-inflated or hurdle type models?
> > I've fit such models in MCMCglmm, but I don't work in ecology and my
> > previous experience with explaining such models to "my audience" did not
> > bode well. When it comes to humans, the researchers I presented to are
> not
> > used to offspring count being zero-inflated (or acquainted with that
> > concept), but in my historical data with high infant mortality, it is (in
> > modern data it's actually slightly underdispersed).
> >
> > Currently I'm using lme4 and simply splitting my models into two stages
> > (finding a mate and having offspring).
> > That's okay too, but in one population the effect of interest is not
> > clearly visible in either stage, only when both are taken together (but
> > then the outcome is zero-inflated).
> > I expect to be given a hard time for this and hence thought I'd use a
> > binomial model with the outcome offspring>0 as my main model, but that
> > turns out to be hard to explain too and doesn't
> > really do the data justice.
> >
> > Basically I don't want to be forced to discuss my smallest population as
> a
> > non-replication of the effect because I was insufficiently able to
> explain
> > the statistics behind my reasoning that the effect shows.
>
> I think the back-of-the envelope answer would be that for a two-stage
> model with a prediction of p_i for the probability of having a non-zero
> response (or in the case of zero-inflated models, the probability of
> _not_ having a structural zero) and a prediction of n_i for the conditional
> part of the model, the mean predicted value is p_i*n_i and the
> variance is _approximately_ (p_i*n_i)^2*(var(p_i)/p_i^2 + var(n_i)/n_i^2)
> (this is assuming
> that you haven't built in any correlation between p_i and n_i, which
> would be hard in lme4 but _might_ be possible under certain circumstances
> via a multitype model in MCMCglmm).
>
> Does that help?
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list