[R-sig-ME] Predictions from zero-inflated or hurdle models
Ruben Arslan
rubenarslan at gmail.com
Tue Mar 17 14:33:25 CET 2015
Dear list,
I've made a reproducible example of the zero-altered prediction,
in the hope that someone will have a look and reassure me that I'm going
about this the right way.
I was a bit confused by the point about p_i and n_i being correlated (they
are in my case), but I think this was a red herring for me
since I'm not deriving the variance analytically.
The script is here: https://gist.github.com/rubenarslan/aeacdd306b3d061819a6
and if you don't want to run the simulation fit yourself, I've put an RDS
file of the fit here:
https://dl.dropboxusercontent.com/u/1078620/m1.rds
Best regards,
Ruben Arslan
On Tue, Mar 10, 2015 at 1:22 PM Ruben Arslan <rubenarslan at gmail.com> wrote:
> Dear Dr Bolker,
>
> I'd thought about something like this, one point of asking was to see whether
> a) it's implemented already, because I'll probably make dumb mistakes while
> trying b) it's not implemented because it's a bad idea.
> Your response and the MCMCglmm course notes make me hope that it's c) not
> implemented because nobody did yet or d) it's so simple that everybody does
> it on-the-fly.
>
> So I tried my hand and would appreciate corrections. I am sure there is
> some screw-up or an inelegant approach in there.
> I included code for dealing with mcmc.lists because that's what I have and
> I'm not entirely sure how I deal with them is correct either.
>
> I started with a zero-altered model, because those fit fastest and
> according to the course notes have the least complex likelihood.
> Because I know not what I do, I'm not dealing with my random effects at
> all.
>
> I pasted a model summary below to show what I've applied the below
> function to. The function gives the following quantiles when applied to 19
> chains of that model.
> 5% 50% 95%
> 5.431684178 5.561211207 5.690655200
> 5% 50% 95%
> 5.003974382 5.178192327 5.348246558
>
> Warm regards,
>
> Ruben Arslan
>
> HPDpredict_za = function(object, predictor) {
>
> if(class(object) != "MCMCglmm") {
> if(length( object[[1]]$Residual$nrt )>1) {
> object = lapply(object,FUN=function(x) { x$Residual$nrt<-2;x })
> }
> Sol = mcmc.list(lapply(object,FUN=function(x) { x$Sol}))
> vars = colnames(Sol[[1]])
> } else {
> Sol = as.data.frame(object$Sol)
> vars = names(Sol)
> }
> za_predictor = vars[ vars %ends_with% predictor & vars %begins_with%
> "traitza_"]
> za_intercept_name = vars[ ! vars %contains% ":" & vars %begins_with%
> "traitza_"]
> intercept = Sol[,"(Intercept)"]
> za_intercept = Sol[, za_intercept_name]
> l1 = Sol[, predictor ]
> l2 = Sol[, za_predictor ]
> if(is.list(object)) {
> intercept = unlist(intercept)
> za_intercept = unlist(za_intercept)
> l1 = unlist(l1)
> l2 = unlist(l2)
> }
> py_0 = dpois(0, exp(intercept + za_intercept))
> y_ygt0 = exp(intercept)
> at_intercept = (1-py_0) * y_ygt0
>
> py_0 = dpois(0, exp(intercept + za_intercept + l2))
> y_ygt0 = exp(intercept + l1)
> at_predictor_1 = (1-py_0) * y_ygt0
> print(qplot(at_intercept))
> print(qplot(at_predictor_1))
> df = data.frame("intercept" = at_intercept)
> df[, predictor] = at_predictor_1
> print(qplot(x=variable,
> y=value,data=suppressMessages(melt(df)),fill=variable,alpha=I(0.40), geom =
> 'violin'))
> print(quantile(at_intercept, probs = c(0.05,0.5,0.95)))
> print(quantile(at_predictor_1, probs = c(0.05,0.5,0.95)))
> invisible(df)
> }
>
>
> > summary(object[[1]])
>
> Iterations = 100001:299901
> Thinning interval = 100
> Sample size = 2000
>
> DIC: 349094
>
> G-structure: ~idh(trait):idParents
>
> post.mean l-95% CI u-95% CI eff.samp
> children.idParents 0.0189 0.0164 0.0214 1729
> za_children.idParents 0.2392 0.2171 0.2622 1647
>
> R-structure: ~idh(trait):units
>
> post.mean l-95% CI u-95% CI eff.samp
> children.units 0.144 0.139 0.148 1715
> za_children.units 1.000 1.000 1.000 0
>
> Location effects: children ~ trait * (maternalage.factor + paternalloss +
> maternalloss + center(nr.siblings) + birth.cohort + urban + male +
> paternalage.mean + paternalage.diff)
>
> post.mean l-95% CI u-95% CI
> eff.samp pMCMC
> (Intercept) 2.088717 2.073009 2.103357
> 2000 <0.0005 ***
> traitza_children -1.933491 -1.981945 -1.887863
> 2000 <0.0005 ***
> maternalage.factor(14,20] 0.007709 -0.014238 0.027883
> 1500 0.460
> maternalage.factor(35,50] 0.006350 -0.009634 0.024107
> 2000 0.462
> paternallossTRUE 0.000797 -0.022716 0.025015
> 2000 0.925
> maternallossTRUE -0.015542 -0.040240 0.009549
> 2000 0.226
> center(nr.siblings) 0.005869 0.004302 0.007510
> 2000 <0.0005 ***
> birth.cohort(1703,1722] -0.045487 -0.062240 -0.028965
> 2000 <0.0005 ***
> birth.cohort(1722,1734] -0.055872 -0.072856 -0.036452
> 2000 <0.0005 ***
> birth.cohort(1734,1743] -0.039770 -0.056580 -0.020907
> 2000 <0.0005 ***
> birth.cohort(1743,1750] -0.030713 -0.048301 -0.012214
> 2000 0.002 **
> urban -0.076748 -0.093240 -0.063002
> 2567 <0.0005 ***
> male 0.106074 0.095705 0.115742
> 2000 <0.0005 ***
> paternalage.mean -0.024119 -0.033133 -0.014444
> 2000 <0.0005 ***
> paternalage.diff -0.018367 -0.032083 -0.005721
> 2000 0.007 **
> traitza_children:maternalage.factor(14,20] -0.116510 -0.182432 -0.051978
> 1876 0.001 ***
> traitza_children:maternalage.factor(35,50] -0.045196 -0.094485 0.002640
> 2000 0.075 .
> traitza_children:paternallossTRUE -0.171957 -0.238218 -0.104820
> 2000 <0.0005 ***
> traitza_children:maternallossTRUE -0.499539 -0.566825 -0.430637
> 2000 <0.0005 ***
> traitza_children:center(nr.siblings) -0.023723 -0.028676 -0.018746
> 1848 <0.0005 ***
> traitza_children:birth.cohort(1703,1722] -0.026012 -0.074250 0.026024
> 2000 0.319
> traitza_children:birth.cohort(1722,1734] -0.279418 -0.329462 -0.227187
> 2000 <0.0005 ***
> traitza_children:birth.cohort(1734,1743] -0.260165 -0.312659 -0.204462
> 2130 <0.0005 ***
> traitza_children:birth.cohort(1743,1750] -0.481457 -0.534568 -0.426648
> 2000 <0.0005 ***
> traitza_children:urban -0.604108 -0.645169 -0.562554
> 1702 <0.0005 ***
> traitza_children:male -0.414988 -0.444589 -0.387005
> 2000 <0.0005 ***
> traitza_children:paternalage.mean 0.006545 -0.018570 0.036227
> 2000 0.651
> traitza_children:paternalage.diff -0.097982 -0.136302 -0.060677
> 2000 <0.0005 ***
>
>
> On Mon, Mar 9, 2015 at 10:12 PM Ben Bolker <bbolker at gmail.com> wrote:
>
>> Ruben Arslan <rubenarslan at ...> writes:
>>
>> >
>> > Dear list,
>> >
>> > I wanted to ask: Is there any (maybe just back of the envelope) way to
>> > obtain a response prediction for zero-inflated or hurdle type models?
>> > I've fit such models in MCMCglmm, but I don't work in ecology and my
>> > previous experience with explaining such models to "my audience" did not
>> > bode well. When it comes to humans, the researchers I presented to are
>> not
>> > used to offspring count being zero-inflated (or acquainted with that
>> > concept), but in my historical data with high infant mortality, it is
>> (in
>> > modern data it's actually slightly underdispersed).
>> >
>> > Currently I'm using lme4 and simply splitting my models into two stages
>> > (finding a mate and having offspring).
>> > That's okay too, but in one population the effect of interest is not
>> > clearly visible in either stage, only when both are taken together (but
>> > then the outcome is zero-inflated).
>> > I expect to be given a hard time for this and hence thought I'd use a
>> > binomial model with the outcome offspring>0 as my main model, but that
>> > turns out to be hard to explain too and doesn't
>> > really do the data justice.
>> >
>> > Basically I don't want to be forced to discuss my smallest population
>> as a
>> > non-replication of the effect because I was insufficiently able to
>> explain
>> > the statistics behind my reasoning that the effect shows.
>>
>> I think the back-of-the envelope answer would be that for a two-stage
>> model with a prediction of p_i for the probability of having a non-zero
>> response (or in the case of zero-inflated models, the probability of
>> _not_ having a structural zero) and a prediction of n_i for the
>> conditional
>> part of the model, the mean predicted value is p_i*n_i and the
>> variance is _approximately_ (p_i*n_i)^2*(var(p_i)/p_i^2 + var(n_i)/n_i^2)
>> (this is assuming
>> that you haven't built in any correlation between p_i and n_i, which
>> would be hard in lme4 but _might_ be possible under certain circumstances
>> via a multitype model in MCMCglmm).
>>
>> Does that help?
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list