[R-sig-ME] Predictions from zero-inflated or hurdle models
Jarrod Hadfield
j.hadfield at ed.ac.uk
Sat Mar 28 08:35:56 CET 2015
Hi Ruben,
Regarding 2/ you want to add the family variance if you want to
predict what the average response will be across families. If you want
to predict for a specific family (for example one where the random
effect = 0) then you should not include the variance.
Cheers,
Jarrod
Quoting Ruben Arslan <rubenarslan at gmail.com> on Wed, 18 Mar 2015
13:51:50 +0000:
> Hi,
>
> thanks for your reply. This clears some things up.
>
> 1. Colour me interested then, if you ever find the time to put it on
> Github. Maybe after seeing the nonsense I've wrought in the following
> points you might find sending me a zip of the pre-release might save you
> some time as opposed to guessing where I went wrong?
>
> 2. I'm not sure I understand.
> You're saying I can add the VCV components for the family random effect to
> the respective unit components.
> But I would only want that if I was interested in predicting the effect for
> any family, which in my case is not the goal, since my population data
> includes all families and I want to know the effect for the average family
> in that population (to compare it to other populations).
>
> 3. Oh right, I'm sorry I think I got a little confused/sidetracked, these
> models have been with me for a while now and I learned about us()/idh()
> back then.
>
> 4. Okay, I will keep this in mind, but am for now focusing on getting the
> prediction for this simpler case right.
>
> 5. I made a reproducible report
> <http://rpubs.com/rubenarslan/repro_variab_mcmcglmm> to show the problem. I
> think the problem is not due to the changes I made in my prediction helper
> function but to some fundamental.
> It shows that for some toy data, the credibility intervals for predictions
> depend on the reference I choose for a factor predictor.
> This is less bad with the method I used initially simply using the inverse
> link.
> However, both methods _underpredict_ as far as I can tell.
>
> Surely I'm missing an important step, but I hope this illustrates what
> confuses me. Of course the reasons you map out for the discrepancy
> (variable confounding etc) are sound in theory, but I don't see these wide
> CIs using any other method (simple descriptives, lme4).
> Oh and I used a different data simulation, because the one you used
> actually induces zero-deflation as far as I can tell (a lot of zeroes, but
> lambda is very low too).
>
> Best regards and again thanks so much for taking the time,
>
> Ruben
>
>
> On Wed, Mar 18, 2015 at 7:06 AM Jarrod Hadfield <j.hadfield at ed.ac.uk> wrote:
>
>> Hi,
>>
>> 6/ you are correct - my mistake.
>>
>>
>> 5b/ I'm still not sure what you mean by the uncertainty, but ignoring
>> the random effects will result in under prediction. For a Poisson
>> log-normal the mean is exp(a_1+v_1/2) rather than exp(a_1) for the
>> standard Poisson.
>>
>> Cheers,
>>
>> Jarrod
>>
>>
>>
>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Tue, 17 Mar 2015
>> 19:21:06 +0000:
>>
>> > Hi,
>> >
>> > two more questions, sorry.
>> >
>> > 6.
>> > could it be that where you wrote
>> > normal.evd<-function(x, mu, v){
>> > exp(-exp(x))*dnorm(x, mu, sqrt(v))
>> > }
>> >
>> > you're not actually passing the variance but the standard deviation? so
>> the
>> > functions should be
>> >
>> > normal.evd<-function(x, mu, sd){
>> > exp(-exp(x))*dnorm(x, mu, sd)
>> > }
>> > normal.zt<-function(x, mu, sd){
>> > exp(x)/(1-exp(-exp(x)))*dnorm(x, mu, sd)
>> > }
>> >
>> > Because at least in the example, you were taking the square root here
>> > already: sd_1<-mean(sqrt(m1$VCV[,1]))
>> > Maybe this was a copy-paste from the source where you do it differently?
>> Or
>> > I misunderstood...
>> >
>> > An addition to question 5:
>> > 5. When I use the correct inverse link function in my old code, but don't
>> > do double integration, I do get a "reasonable" amount of uncertainty
>> around
>> > the reference factor. The predicted means are too low, which I put down
>> to
>> > this approach ignoring the VCV. I was just wondering how this could
>> happen
>> > (actually seeing the "reasonable" amount of uncertainty was what made me
>> > hopeful that my approach might not be entirely wrong).
>> >
>> > Best,
>> >
>> > Ruben
>> >
>> >
>> > On Tue, Mar 17, 2015 at 7:53 PM Ruben Arslan <rubenarslan at gmail.com>
>> wrote:
>> >
>> >> Hi Jarrod,
>> >>
>> >> thanks for the extensive reply! This helps a lot, though it sounds like
>> I
>> >> was hubristic to attempt this myself.
>> >> I tried using the approach you mapped out in the function gist
>> >> <https://gist.github.com/rubenarslan/aeacdd306b3d061819a6> I posted. I
>> >> simply put the pred function in a loop, so that I wouldn't make any
>> >> mistakes while vectorising and since I don't care about performance at
>> this
>> >> point.
>> >>
>> >> Of course, I have some follow up questions though.. I'm sorry if I'm
>> being
>> >> a pain, I really appreciate the free advice, but understand of course if
>> >> other things take precedence.
>> >>
>> >> 1. You're not "down with" developing publicly are you? Because I sure
>> >> would like to test-drive the newdata prediction and simulation
>> functions..
>> >>
>> >> 2. Could you make sure that I got this right: "When predictions are to
>> be
>> >> taken after marginalising the random effects (including the `residual'
>> >> over-dispersion) it is not possible to obtain closed form expressions."
>> >> That is basically my scenario, right? In the example I included, I also
>> >> had a group-level random effect (family). Ore are you talking about the
>> >> "trait" as the random effect (as in your example) and my scenario is
>> >> different and I cannot apply the numerical double integration procedure
>> you
>> >> posted?
>> >> To be clear about my prediction goal without using language that I might
>> >> be using incorrectly: I want to show what the average effect in the
>> >> response unit, number of children, is in my population(s). I have data
>> on
>> >> whole populations and am using all of it (except individuals that don't
>> >> have completed fertility yet, because I have yet to find a way to model
>> >> both zero-inflation and right censoring).
>> >>
>> >> 3. "Numerical integration could be extended to double integration in
>> >> which case covariance between the Poisson part and the binary part could
>> >> be handled." That is what you posted an example of and it applies to my
>> >> scenario, because I specified a prior R=list(V=diag(2), nu=1.002, fix=2)
>> >> and rcov=~idh(trait):units, random=~idh(trait):idParents?
>> >> But this double integration approach is something you just wrote
>> >> off-the-cuff and I probably shouldn't use it in a publication? Or is
>> this
>> >> in the forthcoming MCMCglmm release and I might actually be able to
>> refer
>> >> to it once I get to submitting?
>> >>
>> >> 4. Could I change my model specification to forbid covariance between
>> the
>> >> two parts and not shoot myself in the foot? Would this allow for a more
>> >> valid/tested approach to prediction?
>> >>
>> >> 5. When I use your method on my real data, I get less variation around
>> the
>> >> prediction "for the reference level" than for all other factor levels.
>> >> My reference level actually has fewer cases than the others, so this
>> >> isn't "right" in a way.
>> >> Is this because I'm not doing newdata prediction? I get the "right"
>> >> looking uncertainty if I bootstrap newdata predictions in lme4,
>> >> Sorry if this is children's logic :-)
>> >> Here's an image of the prediction
>> >> <http://rpubs.com/rubenarslan/mcmcglmm_pred> and the raw data
>> >> <http://rpubs.com/rubenarslan/raw>.
>> >>
>> >> Many thanks for any answers that you feel inclined to give.
>> >>
>> >> Best wishes,
>> >>
>> >> Ruben
>> >>
>> >> On Tue, Mar 17, 2015 at 5:31 PM Jarrod Hadfield <j.hadfield at ed.ac.uk>
>> >> wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> Sorry - I should have replied to this post earlier. I've been working
>> >>> on predict/simulate methods for all MCMcglmm models (including
>> >>> zero-inflated/altered/hurdle/truncated models) and these are now
>> >>> largely complete (together with newdata options).
>> >>>
>> >>> When predictions are to be taken after marginalising the random
>> >>> effects (including the `residual' over-dispersion) it is not possible
>> >>> to obtain closed form expressions. The options that will be available
>> >>> in MCMCglmm are:
>> >>>
>> >>> 1) algebraic approximation
>> >>> 2) numerical integration
>> >>> 3) simulation
>> >>>
>> >>> 1) and 2) are currently only accurate when the random/residual effect
>> >>> structure implies no covariance between the Poisson part and the
>> >>> binary part.
>> >>>
>> >>> 1) is reasonably accurate for zero-inflated distributions, but can be
>> >>> pretty poor for the remainder because they all involve zero-truncated
>> >>> Poisson log-normal distributions and my taylor approximation for the
>> >>> mean is less than ideal (any suggestions would be helpful).
>> >>>
>> >>> 2) could be extended to double integration in which case covariance
>> >>> between the Poisson part and the binary part could be handled.
>> >>>
>> >>> In your code, part of the problem is that you have fitted a zapoisson,
>> >>> but the prediction is based on a zipoisson (with complementary log-log
>> >>> link rather than logt link).
>> >>>
>> >>> In all zero-inflated/altered/hurdle/truncated models
>> >>>
>> >>> E[y] = E[(1-prob)*meanc]
>> >>>
>> >>> where prob is the probabilty of a zero in the binary part and meanc is
>> >>> the mean of a Poisson distribution (zipoisson) or a zero-truncated
>> >>> poisson (zapoisson and hupoisson). If we have eta_1 as the linear
>> >>> predictor for the poisson part and eta_2 as the linear predictor for
>> >>> the binary part:
>> >>>
>> >>> In zipoisson: prob = plogis(eta_2) and meanc = exp(eta_1)
>> >>> In zapoisson: prob = exp(-exp(eta_2)) and meanc =
>> >>> exp(eta_1)/(1-exp(-exp(eta_1)))
>> >>> In hupoisson: prob = plogis(eta_2) and meanc =
>> >>> exp(eta_1)/(1-exp(-exp(eta_1)))
>> >>> In ztpoisson: prob = 0 and meanc =
>> >>> exp(eta_1)/(1-exp(-exp(eta_1)))
>> >>>
>> >>> In each case the linear predictor has a `fixed' part and a `random'
>> >>> part which I'll denote as `a' and `u' respectively. Ideally we would
>> >>> want
>> >>>
>> >>> E[(1-prob)*meanc] taken over u_1 & u_2
>> >>>
>> >>> if prob and meanc are independent this is a bit easier as
>> >>>
>> >>> E[y] = E[1-prob]E[meanc]
>> >>>
>> >>> and the two expectations ony need to taken with repsect to their
>> >>> respective random effects. If we have sd_1 and sd_2 as the standard
>> >>> deviations of the two sets of random effects then for the zapoisson:
>> >>>
>> >>> normal.evd<-function(x, mu, v){
>> >>> exp(-exp(x))*dnorm(x, mu, sqrt(v))
>> >>> }
>> >>> normal.zt<-function(x, mu, v){
>> >>> exp(x)/(1-exp(-exp(x)))*dnorm(x, mu, sqrt(v))
>> >>> }
>> >>>
>> >>> pred<-function(a_1, a_2, sd_1, sd_2){
>> >>> prob<-1-integrate(normal.evd, qnorm(0.0001, a_2,sd_2),
>> >>> qnorm(0.9999, a_2,sd_2), a_2,sd_2)[[1]]
>> >>> meanc<-integrate(normal.zt, qnorm(0.0001, a_1,sd_1),
>> >>> qnorm(0.9999, a_1,sd_1), a_1,sd_1)[[1]]
>> >>> prob*meanc
>> >>> }
>> >>>
>> >>> # gives the expected value with reasonable accuracy. As an example:
>> >>>
>> >>> x<-rnorm(300)
>> >>> l1<-rnorm(300, 1/2+x, sqrt(1))
>> >>> l2<-rnorm(300, 1-x, sqrt(1))
>> >>>
>> >>> y<-rbinom(300, 1, 1-exp(-exp(l2)))
>> >>> y[which(y==1)]<-qpois(runif(sum(y==1), dpois(0,
>> >>> exp(l1[which(y==1)])), 1), exp(l1[which(y==1)]))
>> >>> # cunning sampler from Peter Dalgaard (R-sig-mixed)
>> >>>
>> >>> data=data.frame(y=y, x=x)
>> >>> prior=list(R=list(V=diag(2), nu=0.002, fix=2))
>> >>>
>> >>> m1<-MCMCglmm(y~trait+trait:x-1, rcov=~idh(trait):units, data=data,
>> >>> family="zapoisson", prior=prior)
>> >>>
>> >>> b_1<-colMeans(m1$Sol)[c(1,3)]
>> >>> b_2<-colMeans(m1$Sol)[c(2,4)]
>> >>> sd_1<-mean(sqrt(m1$VCV[,1]))
>> >>> sd_2<-mean(sqrt(m1$VCV[,2]))
>> >>>
>> >>> # note it is more accurate to take the posterior mean prediction
>> >>> rather than the prediction from the posterior means as I've done here,
>> >>> but for illustration:
>> >>>
>> >>> x.pred<-seq(-3,3,length=100)
>> >>> p<-1:100
>> >>> for(i in 1:100){
>> >>> p[i]<-pred(a_1 = b_1[1]+x.pred[i]*b_1[2], a_2 =
>> >>> b_2[1]+x.pred[i]*b_2[2], sd_1=sd_1, sd_2=sd_2)
>> >>> }
>> >>>
>> >>> plot(y~x)
>> >>> lines(p~x.pred)
>> >>>
>> >>> Cheers,
>> >>>
>> >>> Jarrod
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Tue, 17 Mar 2015
>> >>> 13:33:25 +0000:
>> >>>
>> >>> > Dear list,
>> >>> >
>> >>> > I've made a reproducible example of the zero-altered prediction,
>> >>> > in the hope that someone will have a look and reassure me that I'm
>> going
>> >>> > about this the right way.
>> >>> > I was a bit confused by the point about p_i and n_i being correlated
>> >>> (they
>> >>> > are in my case), but I think this was a red herring for me
>> >>> > since I'm not deriving the variance analytically.
>> >>> > The script is here: https://gist.github.com/rubenarslan/
>> >>> aeacdd306b3d061819a6
>> >>> > and if you don't want to run the simulation fit yourself, I've put an
>> >>> RDS
>> >>> > file of the fit here:
>> >>> > https://dl.dropboxusercontent.com/u/1078620/m1.rds
>> >>> >
>> >>> > Best regards,
>> >>> >
>> >>> > Ruben Arslan
>> >>> >
>> >>> > On Tue, Mar 10, 2015 at 1:22 PM Ruben Arslan <rubenarslan at gmail.com>
>> >>> wrote:
>> >>> >
>> >>> >> Dear Dr Bolker,
>> >>> >>
>> >>> >> I'd thought about something like this, one point of asking was to
>> >>> >> see whether
>> >>> >> a) it's implemented already, because I'll probably make dumb
>> mistakes
>> >>> while
>> >>> >> trying b) it's not implemented because it's a bad idea.
>> >>> >> Your response and the MCMCglmm course notes make me hope that it's
>> c)
>> >>> not
>> >>> >> implemented because nobody did yet or d) it's so simple that
>> everybody
>> >>> does
>> >>> >> it on-the-fly.
>> >>> >>
>> >>> >> So I tried my hand and would appreciate corrections. I am sure
>> there is
>> >>> >> some screw-up or an inelegant approach in there.
>> >>> >> I included code for dealing with mcmc.lists because that's what I
>> have
>> >>> and
>> >>> >> I'm not entirely sure how I deal with them is correct either.
>> >>> >>
>> >>> >> I started with a zero-altered model, because those fit fastest and
>> >>> >> according to the course notes have the least complex likelihood.
>> >>> >> Because I know not what I do, I'm not dealing with my random
>> effects at
>> >>> >> all.
>> >>> >>
>> >>> >> I pasted a model summary below to show what I've applied the below
>> >>> >> function to. The function gives the following quantiles when applied
>> >>> to 19
>> >>> >> chains of that model.
>> >>> >> 5% 50% 95%
>> >>> >> 5.431684178 5.561211207 5.690655200
>> >>> >> 5% 50% 95%
>> >>> >> 5.003974382 5.178192327 5.348246558
>> >>> >>
>> >>> >> Warm regards,
>> >>> >>
>> >>> >> Ruben Arslan
>> >>> >>
>> >>> >> HPDpredict_za = function(object, predictor) {
>> >>> >>
>> >>> >> if(class(object) != "MCMCglmm") {
>> >>> >> if(length( object[[1]]$Residual$nrt )>1) {
>> >>> >> object = lapply(object,FUN=function(x) { x$Residual$nrt<-2;x })
>> >>> >> }
>> >>> >> Sol = mcmc.list(lapply(object,FUN=function(x) { x$Sol}))
>> >>> >> vars = colnames(Sol[[1]])
>> >>> >> } else {
>> >>> >> Sol = as.data.frame(object$Sol)
>> >>> >> vars = names(Sol)
>> >>> >> }
>> >>> >> za_predictor = vars[ vars %ends_with% predictor & vars %begins_with%
>> >>> >> "traitza_"]
>> >>> >> za_intercept_name = vars[ ! vars %contains% ":" & vars %begins_with%
>> >>> >> "traitza_"]
>> >>> >> intercept = Sol[,"(Intercept)"]
>> >>> >> za_intercept = Sol[, za_intercept_name]
>> >>> >> l1 = Sol[, predictor ]
>> >>> >> l2 = Sol[, za_predictor ]
>> >>> >> if(is.list(object)) {
>> >>> >> intercept = unlist(intercept)
>> >>> >> za_intercept = unlist(za_intercept)
>> >>> >> l1 = unlist(l1)
>> >>> >> l2 = unlist(l2)
>> >>> >> }
>> >>> >> py_0 = dpois(0, exp(intercept + za_intercept))
>> >>> >> y_ygt0 = exp(intercept)
>> >>> >> at_intercept = (1-py_0) * y_ygt0
>> >>> >>
>> >>> >> py_0 = dpois(0, exp(intercept + za_intercept + l2))
>> >>> >> y_ygt0 = exp(intercept + l1)
>> >>> >> at_predictor_1 = (1-py_0) * y_ygt0
>> >>> >> print(qplot(at_intercept))
>> >>> >> print(qplot(at_predictor_1))
>> >>> >> df = data.frame("intercept" = at_intercept)
>> >>> >> df[, predictor] = at_predictor_1
>> >>> >> print(qplot(x=variable,
>> >>> >> y=value,data=suppressMessages(melt(df)),fill=variable,alpha=
>> I(0.40),
>> >>> geom =
>> >>> >> 'violin'))
>> >>> >> print(quantile(at_intercept, probs = c(0.05,0.5,0.95)))
>> >>> >> print(quantile(at_predictor_1, probs = c(0.05,0.5,0.95)))
>> >>> >> invisible(df)
>> >>> >> }
>> >>> >>
>> >>> >>
>> >>> >> > summary(object[[1]])
>> >>> >>
>> >>> >> Iterations = 100001:299901
>> >>> >> Thinning interval = 100
>> >>> >> Sample size = 2000
>> >>> >>
>> >>> >> DIC: 349094
>> >>> >>
>> >>> >> G-structure: ~idh(trait):idParents
>> >>> >>
>> >>> >> post.mean l-95% CI u-95% CI eff.samp
>> >>> >> children.idParents 0.0189 0.0164 0.0214 1729
>> >>> >> za_children.idParents 0.2392 0.2171 0.2622 1647
>> >>> >>
>> >>> >> R-structure: ~idh(trait):units
>> >>> >>
>> >>> >> post.mean l-95% CI u-95% CI eff.samp
>> >>> >> children.units 0.144 0.139 0.148 1715
>> >>> >> za_children.units 1.000 1.000 1.000 0
>> >>> >>
>> >>> >> Location effects: children ~ trait * (maternalage.factor +
>> >>> paternalloss +
>> >>> >> maternalloss + center(nr.siblings) + birth.cohort + urban + male +
>> >>> >> paternalage.mean + paternalage.diff)
>> >>> >>
>> >>> >> post.mean l-95% CI
>> u-95%
>> >>> CI
>> >>> >> eff.samp pMCMC
>> >>> >> (Intercept) 2.088717 2.073009
>> >>> 2.103357
>> >>> >> 2000 <0.0005 ***
>> >>> >> traitza_children -1.933491 -1.981945
>> >>> -1.887863
>> >>> >> 2000 <0.0005 ***
>> >>> >> maternalage.factor(14,20] 0.007709 -0.014238
>> >>> 0.027883
>> >>> >> 1500 0.460
>> >>> >> maternalage.factor(35,50] 0.006350 -0.009634
>> >>> 0.024107
>> >>> >> 2000 0.462
>> >>> >> paternallossTRUE 0.000797 -0.022716
>> >>> 0.025015
>> >>> >> 2000 0.925
>> >>> >> maternallossTRUE -0.015542 -0.040240
>> >>> 0.009549
>> >>> >> 2000 0.226
>> >>> >> center(nr.siblings) 0.005869 0.004302
>> >>> 0.007510
>> >>> >> 2000 <0.0005 ***
>> >>> >> birth.cohort(1703,1722] -0.045487 -0.062240
>> >>> -0.028965
>> >>> >> 2000 <0.0005 ***
>> >>> >> birth.cohort(1722,1734] -0.055872 -0.072856
>> >>> -0.036452
>> >>> >> 2000 <0.0005 ***
>> >>> >> birth.cohort(1734,1743] -0.039770 -0.056580
>> >>> -0.020907
>> >>> >> 2000 <0.0005 ***
>> >>> >> birth.cohort(1743,1750] -0.030713 -0.048301
>> >>> -0.012214
>> >>> >> 2000 0.002 **
>> >>> >> urban -0.076748 -0.093240
>> >>> -0.063002
>> >>> >> 2567 <0.0005 ***
>> >>> >> male 0.106074 0.095705
>> >>> 0.115742
>> >>> >> 2000 <0.0005 ***
>> >>> >> paternalage.mean -0.024119 -0.033133
>> >>> -0.014444
>> >>> >> 2000 <0.0005 ***
>> >>> >> paternalage.diff -0.018367 -0.032083
>> >>> -0.005721
>> >>> >> 2000 0.007 **
>> >>> >> traitza_children:maternalage.factor(14,20] -0.116510 -0.182432
>> >>> -0.051978
>> >>> >> 1876 0.001 ***
>> >>> >> traitza_children:maternalage.factor(35,50] -0.045196 -0.094485
>> >>> 0.002640
>> >>> >> 2000 0.075 .
>> >>> >> traitza_children:paternallossTRUE -0.171957 -0.238218
>> >>> -0.104820
>> >>> >> 2000 <0.0005 ***
>> >>> >> traitza_children:maternallossTRUE -0.499539 -0.566825
>> >>> -0.430637
>> >>> >> 2000 <0.0005 ***
>> >>> >> traitza_children:center(nr.siblings) -0.023723 -0.028676
>> >>> -0.018746
>> >>> >> 1848 <0.0005 ***
>> >>> >> traitza_children:birth.cohort(1703,1722] -0.026012 -0.074250
>> >>> 0.026024
>> >>> >> 2000 0.319
>> >>> >> traitza_children:birth.cohort(1722,1734] -0.279418 -0.329462
>> >>> -0.227187
>> >>> >> 2000 <0.0005 ***
>> >>> >> traitza_children:birth.cohort(1734,1743] -0.260165 -0.312659
>> >>> -0.204462
>> >>> >> 2130 <0.0005 ***
>> >>> >> traitza_children:birth.cohort(1743,1750] -0.481457 -0.534568
>> >>> -0.426648
>> >>> >> 2000 <0.0005 ***
>> >>> >> traitza_children:urban -0.604108 -0.645169
>> >>> -0.562554
>> >>> >> 1702 <0.0005 ***
>> >>> >> traitza_children:male -0.414988 -0.444589
>> >>> -0.387005
>> >>> >> 2000 <0.0005 ***
>> >>> >> traitza_children:paternalage.mean 0.006545 -0.018570
>> >>> 0.036227
>> >>> >> 2000 0.651
>> >>> >> traitza_children:paternalage.diff -0.097982 -0.136302
>> >>> -0.060677
>> >>> >> 2000 <0.0005 ***
>> >>> >>
>> >>> >>
>> >>> >> On Mon, Mar 9, 2015 at 10:12 PM Ben Bolker <bbolker at gmail.com>
>> wrote:
>> >>> >>
>> >>> >>> Ruben Arslan <rubenarslan at ...> writes:
>> >>> >>>
>> >>> >>> >
>> >>> >>> > Dear list,
>> >>> >>> >
>> >>> >>> > I wanted to ask: Is there any (maybe just back of the envelope)
>> way
>> >>> to
>> >>> >>> > obtain a response prediction for zero-inflated or hurdle type
>> >>> models?
>> >>> >>> > I've fit such models in MCMCglmm, but I don't work in ecology
>> and my
>> >>> >>> > previous experience with explaining such models to "my audience"
>> >>> did not
>> >>> >>> > bode well. When it comes to humans, the researchers I presented
>> to
>> >>> are
>> >>> >>> not
>> >>> >>> > used to offspring count being zero-inflated (or acquainted with
>> that
>> >>> >>> > concept), but in my historical data with high infant mortality,
>> it
>> >>> is
>> >>> >>> (in
>> >>> >>> > modern data it's actually slightly underdispersed).
>> >>> >>> >
>> >>> >>> > Currently I'm using lme4 and simply splitting my models into two
>> >>> stages
>> >>> >>> > (finding a mate and having offspring).
>> >>> >>> > That's okay too, but in one population the effect of interest is
>> not
>> >>> >>> > clearly visible in either stage, only when both are taken
>> together
>> >>> (but
>> >>> >>> > then the outcome is zero-inflated).
>> >>> >>> > I expect to be given a hard time for this and hence thought I'd
>> use
>> >>> a
>> >>> >>> > binomial model with the outcome offspring>0 as my main model, but
>> >>> that
>> >>> >>> > turns out to be hard to explain too and doesn't
>> >>> >>> > really do the data justice.
>> >>> >>> >
>> >>> >>> > Basically I don't want to be forced to discuss my smallest
>> >>> population
>> >>> >>> as a
>> >>> >>> > non-replication of the effect because I was insufficiently able
>> to
>> >>> >>> explain
>> >>> >>> > the statistics behind my reasoning that the effect shows.
>> >>> >>>
>> >>> >>> I think the back-of-the envelope answer would be that for a
>> >>> two-stage
>> >>> >>> model with a prediction of p_i for the probability of having a
>> >>> non-zero
>> >>> >>> response (or in the case of zero-inflated models, the probability
>> of
>> >>> >>> _not_ having a structural zero) and a prediction of n_i for the
>> >>> >>> conditional
>> >>> >>> part of the model, the mean predicted value is p_i*n_i and the
>> >>> >>> variance is _approximately_ (p_i*n_i)^2*(var(p_i)/p_i^2 +
>> >>> var(n_i)/n_i^2)
>> >>> >>> (this is assuming
>> >>> >>> that you haven't built in any correlation between p_i and n_i,
>> which
>> >>> >>> would be hard in lme4 but _might_ be possible under certain
>> >>> circumstances
>> >>> >>> via a multitype model in MCMCglmm).
>> >>> >>>
>> >>> >>> Does that help?
>> >>> >>>
>> >>> >>> _______________________________________________
>> >>> >>> R-sig-mixed-models at r-project.org mailing list
>> >>> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> >>> >>>
>> >>> >>
>> >>> >
>> >>> > [[alternative HTML version deleted]]
>> >>> >
>> >>> > _______________________________________________
>> >>> > R-sig-mixed-models at r-project.org mailing list
>> >>> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> >>> >
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> The University of Edinburgh is a charitable body, registered in
>> >>> Scotland, with registration number SC005336.
>> >>>
>> >>>
>> >>>
>> >
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>>
>>
>
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the R-sig-mixed-models
mailing list