[R-sig-ME] Predictions from zero-inflated or hurdle models

Ruben Arslan rubenarslan at gmail.com
Tue Mar 17 14:33:25 CET 2015


Dear list,

I've made a reproducible example of the zero-altered prediction,
in the hope that someone will have a look and reassure me that I'm going
about this the right way.
I was a bit confused by the point about p_i and n_i being correlated (they
are in my case), but I think this was a red herring for me
since I'm not deriving the variance analytically.
The script is here: https://gist.github.com/rubenarslan/aeacdd306b3d061819a6
and if you don't want to run the simulation fit yourself, I've put an RDS
file of the fit here:
https://dl.dropboxusercontent.com/u/1078620/m1.rds

Best regards,

Ruben Arslan

On Tue, Mar 10, 2015 at 1:22 PM Ruben Arslan <rubenarslan at gmail.com> wrote:

> Dear Dr Bolker,
>
> I'd thought about something like this, one point of asking was to see whether
> a) it's implemented already, because I'll probably make dumb mistakes while
> trying b) it's not implemented because it's a bad idea.
> Your response and the MCMCglmm course notes make me hope that it's c) not
> implemented because nobody did yet or d) it's so simple that everybody does
> it on-the-fly.
>
> So I tried my hand and would appreciate corrections. I am sure there is
> some screw-up or an inelegant approach in there.
> I included code for dealing with mcmc.lists because that's what I have and
> I'm not entirely sure how I deal with them is correct either.
>
> I started with a zero-altered model, because those fit fastest and
> according to the course notes have the least complex likelihood.
> Because I know not what I do, I'm not dealing with my random effects at
> all.
>
> I pasted a model summary below to show what I've applied the below
> function to. The function gives the following quantiles when applied to 19
> chains of that model.
>          5%         50%         95%
> 5.431684178 5.561211207 5.690655200
>          5%         50%         95%
> 5.003974382 5.178192327 5.348246558
>
> Warm regards,
>
> Ruben Arslan
>
> HPDpredict_za = function(object, predictor) {
>
> if(class(object) != "MCMCglmm") {
> if(length( object[[1]]$Residual$nrt )>1) {
> object = lapply(object,FUN=function(x) { x$Residual$nrt<-2;x })
> }
> Sol = mcmc.list(lapply(object,FUN=function(x) { x$Sol}))
> vars = colnames(Sol[[1]])
> } else {
> Sol = as.data.frame(object$Sol)
> vars = names(Sol)
> }
> za_predictor = vars[ vars %ends_with% predictor & vars %begins_with%
> "traitza_"]
> za_intercept_name = vars[ ! vars %contains% ":" & vars %begins_with%
> "traitza_"]
>  intercept = Sol[,"(Intercept)"]
> za_intercept = Sol[, za_intercept_name]
> l1 = Sol[, predictor ]
> l2 = Sol[, za_predictor ]
> if(is.list(object)) {
> intercept = unlist(intercept)
> za_intercept = unlist(za_intercept)
> l1 = unlist(l1)
> l2 = unlist(l2)
> }
>  py_0 = dpois(0, exp(intercept + za_intercept))
> y_ygt0 = exp(intercept)
> at_intercept = (1-py_0) * y_ygt0
>
> py_0 = dpois(0, exp(intercept + za_intercept + l2))
> y_ygt0 = exp(intercept +  l1)
> at_predictor_1 = (1-py_0) * y_ygt0
> print(qplot(at_intercept))
> print(qplot(at_predictor_1))
> df = data.frame("intercept" = at_intercept)
> df[, predictor] = at_predictor_1
> print(qplot(x=variable,
> y=value,data=suppressMessages(melt(df)),fill=variable,alpha=I(0.40), geom =
> 'violin'))
> print(quantile(at_intercept, probs = c(0.05,0.5,0.95)))
> print(quantile(at_predictor_1, probs = c(0.05,0.5,0.95)))
> invisible(df)
> }
>
>
> > summary(object[[1]])
>
>  Iterations = 100001:299901
>  Thinning interval  = 100
>  Sample size  = 2000
>
>  DIC: 349094
>
>  G-structure:  ~idh(trait):idParents
>
>                       post.mean l-95% CI u-95% CI eff.samp
> children.idParents       0.0189   0.0164   0.0214     1729
> za_children.idParents    0.2392   0.2171   0.2622     1647
>
>  R-structure:  ~idh(trait):units
>
>                   post.mean l-95% CI u-95% CI eff.samp
> children.units        0.144    0.139    0.148     1715
> za_children.units     1.000    1.000    1.000        0
>
>  Location effects: children ~ trait * (maternalage.factor + paternalloss +
> maternalloss + center(nr.siblings) + birth.cohort + urban + male +
> paternalage.mean + paternalage.diff)
>
>                                            post.mean  l-95% CI  u-95% CI
> eff.samp   pMCMC
> (Intercept)                                 2.088717  2.073009  2.103357
>   2000 <0.0005 ***
> traitza_children                           -1.933491 -1.981945 -1.887863
>   2000 <0.0005 ***
> maternalage.factor(14,20]                   0.007709 -0.014238  0.027883
>   1500   0.460
> maternalage.factor(35,50]                   0.006350 -0.009634  0.024107
>   2000   0.462
> paternallossTRUE                            0.000797 -0.022716  0.025015
>   2000   0.925
> maternallossTRUE                           -0.015542 -0.040240  0.009549
>   2000   0.226
> center(nr.siblings)                         0.005869  0.004302  0.007510
>   2000 <0.0005 ***
> birth.cohort(1703,1722]                    -0.045487 -0.062240 -0.028965
>   2000 <0.0005 ***
> birth.cohort(1722,1734]                    -0.055872 -0.072856 -0.036452
>   2000 <0.0005 ***
> birth.cohort(1734,1743]                    -0.039770 -0.056580 -0.020907
>   2000 <0.0005 ***
> birth.cohort(1743,1750]                    -0.030713 -0.048301 -0.012214
>   2000   0.002 **
> urban                                      -0.076748 -0.093240 -0.063002
>   2567 <0.0005 ***
> male                                        0.106074  0.095705  0.115742
>   2000 <0.0005 ***
> paternalage.mean                           -0.024119 -0.033133 -0.014444
>   2000 <0.0005 ***
> paternalage.diff                           -0.018367 -0.032083 -0.005721
>   2000   0.007 **
> traitza_children:maternalage.factor(14,20] -0.116510 -0.182432 -0.051978
>   1876   0.001 ***
> traitza_children:maternalage.factor(35,50] -0.045196 -0.094485  0.002640
>   2000   0.075 .
> traitza_children:paternallossTRUE          -0.171957 -0.238218 -0.104820
>   2000 <0.0005 ***
> traitza_children:maternallossTRUE          -0.499539 -0.566825 -0.430637
>   2000 <0.0005 ***
> traitza_children:center(nr.siblings)       -0.023723 -0.028676 -0.018746
>   1848 <0.0005 ***
> traitza_children:birth.cohort(1703,1722]   -0.026012 -0.074250  0.026024
>   2000   0.319
> traitza_children:birth.cohort(1722,1734]   -0.279418 -0.329462 -0.227187
>   2000 <0.0005 ***
> traitza_children:birth.cohort(1734,1743]   -0.260165 -0.312659 -0.204462
>   2130 <0.0005 ***
> traitza_children:birth.cohort(1743,1750]   -0.481457 -0.534568 -0.426648
>   2000 <0.0005 ***
> traitza_children:urban                     -0.604108 -0.645169 -0.562554
>   1702 <0.0005 ***
> traitza_children:male                      -0.414988 -0.444589 -0.387005
>   2000 <0.0005 ***
> traitza_children:paternalage.mean           0.006545 -0.018570  0.036227
>   2000   0.651
> traitza_children:paternalage.diff          -0.097982 -0.136302 -0.060677
>   2000 <0.0005 ***
>
>
> On Mon, Mar 9, 2015 at 10:12 PM Ben Bolker <bbolker at gmail.com> wrote:
>
>> Ruben Arslan <rubenarslan at ...> writes:
>>
>> >
>> > Dear list,
>> >
>> > I wanted to ask: Is there any (maybe just back of the envelope) way to
>> > obtain a response prediction for zero-inflated or hurdle type models?
>> > I've fit such models in MCMCglmm, but I don't work in ecology and my
>> > previous experience with explaining such models to "my audience" did not
>> > bode well. When it comes to humans, the researchers I presented to are
>> not
>> > used to offspring count being zero-inflated (or acquainted with that
>> > concept), but in my historical data with high infant mortality, it is
>> (in
>> > modern data it's actually slightly underdispersed).
>> >
>> > Currently I'm using lme4 and simply splitting my models into two stages
>> > (finding a mate and having offspring).
>> > That's okay too, but in one population the effect of interest is not
>> > clearly visible in either stage, only when both are taken together (but
>> > then the outcome is zero-inflated).
>> > I expect to be given a hard time for this and hence thought I'd use a
>> > binomial model with the outcome offspring>0 as my main model, but that
>> > turns out to be hard to explain too and doesn't
>> > really do the data justice.
>> >
>> > Basically I don't want to be forced to discuss my smallest population
>> as a
>> > non-replication of the effect because I was insufficiently able to
>> explain
>> > the statistics behind my reasoning that the effect shows.
>>
>>   I think the back-of-the envelope answer would be that for a two-stage
>> model with a prediction of p_i for the probability of having a non-zero
>> response (or in the case of zero-inflated models, the probability of
>> _not_ having a structural zero) and a prediction of n_i for the
>> conditional
>> part of the model, the mean predicted value is p_i*n_i and the
>> variance is _approximately_ (p_i*n_i)^2*(var(p_i)/p_i^2 + var(n_i)/n_i^2)
>> (this is assuming
>> that you haven't built in any correlation between p_i and n_i, which
>> would be hard in lme4 but _might_ be possible under certain circumstances
>> via a multitype model in MCMCglmm).
>>
>>   Does that help?
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>

	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list