[R-sig-ME] Predictions from zero-inflated or hurdle models

Tue Mar 10 13:22:26 CET 2015

Dear Dr Bolker,

I'd thought about something like this, one point of asking was to see whether
a) it's implemented already, because I'll probably make dumb mistakes while
trying b) it's not implemented because it's a bad idea.
Your response and the MCMCglmm course notes make me hope that it's c) not
implemented because nobody did yet or d) it's so simple that everybody does
it on-the-fly.

So I tried my hand and would appreciate corrections. I am sure there is
some screw-up or an inelegant approach in there.
I included code for dealing with mcmc.lists because that's what I have and
I'm not entirely sure how I deal with them is correct either.

I started with a zero-altered model, because those fit fastest and
according to the course notes have the least complex likelihood.
Because I know not what I do, I'm not dealing with my random effects at all.

I pasted a model summary below to show what I've applied the below function
to. The function gives the following quantiles when applied to 19 chains of
that model.
         5%         50%         95%
5.431684178 5.561211207 5.690655200
         5%         50%         95%
5.003974382 5.178192327 5.348246558

Warm regards,

Ruben Arslan

HPDpredict_za = function(object, predictor) {

if(class(object) != "MCMCglmm") {
if(length( object[[1]]$Residual$nrt )>1) {
object = lapply(object,FUN=function(x) { x$Residual$nrt<-2;x })
}
Sol = mcmc.list(lapply(object,FUN=function(x) { x$Sol}))
vars = colnames(Sol[[1]])
} else {
Sol = as.data.frame(object$Sol)
vars = names(Sol)
}
za_predictor = vars[ vars %ends_with% predictor & vars %begins_with%
"traitza_"]
za_intercept_name = vars[ ! vars %contains% ":" & vars %begins_with%
"traitza_"]
 intercept = Sol[,"(Intercept)"]
za_intercept = Sol[, za_intercept_name]
l1 = Sol[, predictor ]
l2 = Sol[, za_predictor ]
if(is.list(object)) {
intercept = unlist(intercept)
za_intercept = unlist(za_intercept)
l1 = unlist(l1)
l2 = unlist(l2)
}
 py_0 = dpois(0, exp(intercept + za_intercept))
y_ygt0 = exp(intercept)
at_intercept = (1-py_0) * y_ygt0

py_0 = dpois(0, exp(intercept + za_intercept + l2))
y_ygt0 = exp(intercept +  l1)
at_predictor_1 = (1-py_0) * y_ygt0
print(qplot(at_intercept))
print(qplot(at_predictor_1))
df = data.frame("intercept" = at_intercept)
df[, predictor] = at_predictor_1
print(qplot(x=variable,
y=value,data=suppressMessages(melt(df)),fill=variable,alpha=I(0.40), geom =
'violin'))
print(quantile(at_intercept, probs = c(0.05,0.5,0.95)))
print(quantile(at_predictor_1, probs = c(0.05,0.5,0.95)))
invisible(df)
}

> summary(object[[1]])

 Iterations = 100001:299901
 Thinning interval  = 100
 Sample size  = 2000

 DIC: 349094

 G-structure:  ~idh(trait):idParents

                      post.mean l-95% CI u-95% CI eff.samp
children.idParents       0.0189   0.0164   0.0214     1729
za_children.idParents    0.2392   0.2171   0.2622     1647

 R-structure:  ~idh(trait):units

                  post.mean l-95% CI u-95% CI eff.samp
children.units        0.144    0.139    0.148     1715
za_children.units     1.000    1.000    1.000        0

 Location effects: children ~ trait * (maternalage.factor + paternalloss +
maternalloss + center(nr.siblings) + birth.cohort + urban + male +
paternalage.mean + paternalage.diff)

                                           post.mean  l-95% CI  u-95% CI
eff.samp   pMCMC
(Intercept)                                 2.088717  2.073009  2.103357
  2000 <0.0005 ***
traitza_children                           -1.933491 -1.981945 -1.887863
  2000 <0.0005 ***
maternalage.factor(14,20]                   0.007709 -0.014238  0.027883
  1500   0.460
maternalage.factor(35,50]                   0.006350 -0.009634  0.024107
  2000   0.462
paternallossTRUE                            0.000797 -0.022716  0.025015
  2000   0.925
maternallossTRUE                           -0.015542 -0.040240  0.009549
  2000   0.226
center(nr.siblings)                         0.005869  0.004302  0.007510
  2000 <0.0005 ***
birth.cohort(1703,1722]                    -0.045487 -0.062240 -0.028965
  2000 <0.0005 ***
birth.cohort(1722,1734]                    -0.055872 -0.072856 -0.036452
  2000 <0.0005 ***
birth.cohort(1734,1743]                    -0.039770 -0.056580 -0.020907
  2000 <0.0005 ***
birth.cohort(1743,1750]                    -0.030713 -0.048301 -0.012214
  2000   0.002 **
urban                                      -0.076748 -0.093240 -0.063002
  2567 <0.0005 ***
male                                        0.106074  0.095705  0.115742
  2000 <0.0005 ***
paternalage.mean                           -0.024119 -0.033133 -0.014444
  2000 <0.0005 ***
paternalage.diff                           -0.018367 -0.032083 -0.005721
  2000   0.007 **
traitza_children:maternalage.factor(14,20] -0.116510 -0.182432 -0.051978
  1876   0.001 ***
traitza_children:maternalage.factor(35,50] -0.045196 -0.094485  0.002640
  2000   0.075 .
traitza_children:paternallossTRUE          -0.171957 -0.238218 -0.104820
  2000 <0.0005 ***
traitza_children:maternallossTRUE          -0.499539 -0.566825 -0.430637
  2000 <0.0005 ***
traitza_children:center(nr.siblings)       -0.023723 -0.028676 -0.018746
  1848 <0.0005 ***
traitza_children:birth.cohort(1703,1722]   -0.026012 -0.074250  0.026024
  2000   0.319
traitza_children:birth.cohort(1722,1734]   -0.279418 -0.329462 -0.227187
  2000 <0.0005 ***
traitza_children:birth.cohort(1734,1743]   -0.260165 -0.312659 -0.204462
  2130 <0.0005 ***
traitza_children:birth.cohort(1743,1750]   -0.481457 -0.534568 -0.426648
  2000 <0.0005 ***
traitza_children:urban                     -0.604108 -0.645169 -0.562554
  1702 <0.0005 ***
traitza_children:male                      -0.414988 -0.444589 -0.387005
  2000 <0.0005 ***
traitza_children:paternalage.mean           0.006545 -0.018570  0.036227
  2000   0.651
traitza_children:paternalage.diff          -0.097982 -0.136302 -0.060677
  2000 <0.0005 ***

On Mon, Mar 9, 2015 at 10:12 PM Ben Bolker <bbolker at gmail.com> wrote:

> Ruben Arslan <rubenarslan at ...> writes:
>
> >
> > Dear list,
> >
> > I wanted to ask: Is there any (maybe just back of the envelope) way to
> > obtain a response prediction for zero-inflated or hurdle type models?
> > I've fit such models in MCMCglmm, but I don't work in ecology and my
> > previous experience with explaining such models to "my audience" did not
> > bode well. When it comes to humans, the researchers I presented to are
> not
> > used to offspring count being zero-inflated (or acquainted with that
> > concept), but in my historical data with high infant mortality, it is (in
> > modern data it's actually slightly underdispersed).
> >
> > Currently I'm using lme4 and simply splitting my models into two stages
> > (finding a mate and having offspring).
> > That's okay too, but in one population the effect of interest is not
> > clearly visible in either stage, only when both are taken together (but
> > then the outcome is zero-inflated).
> > I expect to be given a hard time for this and hence thought I'd use a
> > binomial model with the outcome offspring>0 as my main model, but that
> > turns out to be hard to explain too and doesn't
> > really do the data justice.
> >
> > Basically I don't want to be forced to discuss my smallest population as
> a
> > non-replication of the effect because I was insufficiently able to
> explain
> > the statistics behind my reasoning that the effect shows.
>
>   I think the back-of-the envelope answer would be that for a two-stage
> model with a prediction of p_i for the probability of having a non-zero
> response (or in the case of zero-inflated models, the probability of
> _not_ having a structural zero) and a prediction of n_i for the conditional
> part of the model, the mean predicted value is p_i*n_i and the
> variance is _approximately_ (p_i*n_i)^2*(var(p_i)/p_i^2 + var(n_i)/n_i^2)
> (this is assuming
> that you haven't built in any correlation between p_i and n_i, which
> would be hard in lme4 but _might_ be possible under certain circumstances
> via a multitype model in MCMCglmm).
>
>   Does that help?
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]