[R-sig-ME] Predictions from zero-inflated or hurdle models

Ruben Arslan rubenarslan at gmail.com
Wed Mar 18 14:51:50 CET 2015


Hi,

thanks for your reply. This clears some things up.

1. Colour me interested then, if you ever find the time to put it on
Github. Maybe after seeing the nonsense I've wrought in the following
points you might find sending me a zip of the pre-release might save you
some time as opposed to guessing where I went wrong?

2. I'm not sure I understand.
You're saying I can add the VCV components for the family random effect to
the respective unit components.
But I would only want that if I was interested in predicting the effect for
any family, which in my case is not the goal, since my population data
includes all families and I want to know the effect for the average family
in that population (to compare it to other populations).

3. Oh right, I'm sorry I think I got a little confused/sidetracked, these
models have been with me for a while now and I learned about us()/idh()
back then.

4. Okay, I will keep this in mind, but am for now focusing on getting the
prediction for this simpler case right.

5. I made a reproducible report
<http://rpubs.com/rubenarslan/repro_variab_mcmcglmm> to show the problem. I
think the problem is not due to the changes I made in my prediction helper
function but to some fundamental.
It shows that for some toy data, the credibility intervals for predictions
depend on the reference I choose for a factor predictor.
This is less bad with the method I used initially simply using the inverse
link.
However, both methods _underpredict_ as far as I can tell.

Surely I'm missing an important step, but I hope this illustrates what
confuses me. Of course the reasons you map out for the discrepancy
(variable confounding etc) are sound in theory, but I don't see these wide
CIs using any other method (simple descriptives, lme4).
Oh and I used a different data simulation, because the one you used
actually induces zero-deflation as far as I can tell (a lot of zeroes, but
lambda is very low too).

Best regards and again thanks so much for taking the time,

Ruben


On Wed, Mar 18, 2015 at 7:06 AM Jarrod Hadfield <j.hadfield at ed.ac.uk> wrote:

> Hi,
>
> 6/ you are correct - my mistake.
>
>
> 5b/ I'm still not sure what you mean by the uncertainty, but ignoring
> the random effects will result in under prediction. For a Poisson
> log-normal the mean  is  exp(a_1+v_1/2)  rather than  exp(a_1) for the
> standard Poisson.
>
> Cheers,
>
> Jarrod
>
>
>
> Quoting Ruben Arslan <rubenarslan at gmail.com> on Tue, 17 Mar 2015
> 19:21:06 +0000:
>
> > Hi,
> >
> > two more questions, sorry.
> >
> > 6.
> > could it be that where you wrote
> > normal.evd<-function(x, mu, v){
> > exp(-exp(x))*dnorm(x, mu, sqrt(v))
> > }
> >
> > you're not actually passing the variance but the standard deviation? so
> the
> > functions should be
> >
> > normal.evd<-function(x, mu, sd){
> > exp(-exp(x))*dnorm(x, mu, sd)
> > }
> > normal.zt<-function(x, mu, sd){
> > exp(x)/(1-exp(-exp(x)))*dnorm(x, mu, sd)
> > }
> >
> > Because at least in the example, you were taking the square root here
> > already: sd_1<-mean(sqrt(m1$VCV[,1]))
> > Maybe this was a copy-paste from the source where you do it differently?
> Or
> > I misunderstood...
> >
> > An addition to question 5:
> > 5. When I use the correct inverse link function in my old code, but don't
> > do double integration, I do get a "reasonable" amount of uncertainty
> around
> > the reference factor. The predicted means are too low, which I put down
> to
> > this approach ignoring the VCV. I was just wondering how this could
> happen
> > (actually seeing the "reasonable" amount of uncertainty was what made me
> > hopeful that my approach might not be entirely wrong).
> >
> > Best,
> >
> > Ruben
> >
> >
> > On Tue, Mar 17, 2015 at 7:53 PM Ruben Arslan <rubenarslan at gmail.com>
> wrote:
> >
> >> Hi Jarrod,
> >>
> >> thanks for the extensive reply! This helps a lot, though it sounds like
> I
> >> was hubristic to attempt this myself.
> >> I tried using the approach you mapped out in the function gist
> >> <https://gist.github.com/rubenarslan/aeacdd306b3d061819a6> I posted. I
> >> simply put the pred function in a loop, so that I wouldn't make any
> >> mistakes while vectorising and since I don't care about performance at
> this
> >> point.
> >>
> >> Of course, I have some follow up questions though.. I'm sorry if I'm
> being
> >> a pain, I really appreciate the free advice, but understand of course if
> >> other things take precedence.
> >>
> >> 1. You're not "down with" developing publicly are you? Because I sure
> >> would like to test-drive the newdata prediction and simulation
> functions..
> >>
> >> 2. Could you make sure that I got this right: "When predictions are to
> be
> >> taken after marginalising the random effects (including the `residual'
> >> over-dispersion) it is not possible to obtain closed form expressions."
> >> That is basically my scenario, right? In the example I included, I also
> >> had a group-level random effect (family). Ore are you talking about the
> >> "trait" as the random effect (as in your example) and my scenario is
> >> different and I cannot apply the numerical double integration procedure
> you
> >> posted?
> >> To be clear about my prediction goal without using language that I might
> >> be using incorrectly: I want to show what the average effect in the
> >> response unit, number of children, is in my population(s). I have data
> on
> >> whole populations and am using all of it (except individuals that don't
> >> have completed fertility yet, because I have yet to find a way to model
> >> both zero-inflation and right censoring).
> >>
> >> 3. "Numerical integration could be extended to double integration in
> >> which case covariance between the Poisson part and the binary part could
> >> be handled." That is what you posted an example of and it applies to my
> >> scenario, because I specified a prior R=list(V=diag(2), nu=1.002, fix=2)
> >> and rcov=~idh(trait):units, random=~idh(trait):idParents?
> >> But this double integration approach is something you just wrote
> >> off-the-cuff and I probably shouldn't use it in a publication? Or is
> this
> >> in the forthcoming MCMCglmm release and I might actually be able to
> refer
> >> to it once I get to submitting?
> >>
> >> 4. Could I change my model specification to forbid covariance between
> the
> >> two parts and not shoot myself in the foot? Would this allow for a more
> >> valid/tested approach to prediction?
> >>
> >> 5. When I use your method on my real data, I get less variation around
> the
> >> prediction "for the reference level" than for all other factor levels.
> >> My reference level actually has fewer cases than the others, so this
> >> isn't "right" in a way.
> >> Is this because I'm not doing newdata prediction? I get the "right"
> >> looking uncertainty if I bootstrap newdata predictions in lme4,
> >> Sorry if this is children's logic :-)
> >> Here's an image of the prediction
> >> <http://rpubs.com/rubenarslan/mcmcglmm_pred> and the raw data
> >> <http://rpubs.com/rubenarslan/raw>.
> >>
> >> Many thanks for any answers that you feel inclined to give.
> >>
> >> Best wishes,
> >>
> >> Ruben
> >>
> >> On Tue, Mar 17, 2015 at 5:31 PM Jarrod Hadfield <j.hadfield at ed.ac.uk>
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> Sorry - I should have replied to this post earlier. I've been working
> >>> on predict/simulate methods for all MCMcglmm models (including
> >>> zero-inflated/altered/hurdle/truncated models) and these are now
> >>> largely complete (together with newdata options).
> >>>
> >>> When predictions are to be taken after marginalising the random
> >>> effects (including the `residual' over-dispersion) it is not possible
> >>> to obtain closed form expressions. The options that will be available
> >>> in MCMCglmm are:
> >>>
> >>> 1) algebraic approximation
> >>> 2) numerical integration
> >>> 3) simulation
> >>>
> >>> 1) and 2) are currently only accurate when the random/residual effect
> >>> structure implies no covariance between the Poisson part and the
> >>> binary part.
> >>>
> >>> 1) is reasonably accurate for zero-inflated distributions, but can be
> >>> pretty poor for the remainder because they all involve zero-truncated
> >>> Poisson log-normal distributions and my taylor approximation for the
> >>> mean is less than ideal (any suggestions would be helpful).
> >>>
> >>> 2) could be extended to double integration in which case covariance
> >>> between the Poisson part and the binary part could be handled.
> >>>
> >>> In your code, part of the problem is that you have fitted a zapoisson,
> >>> but the prediction is based on a zipoisson (with complementary log-log
> >>> link rather than logt link).
> >>>
> >>> In all zero-inflated/altered/hurdle/truncated models
> >>>
> >>> E[y] = E[(1-prob)*meanc]
> >>>
> >>> where prob is the probabilty of a zero in the binary part and meanc is
> >>> the mean of a Poisson distribution (zipoisson) or a zero-truncated
> >>> poisson (zapoisson and hupoisson). If we have eta_1 as the linear
> >>> predictor for the poisson part and eta_2 as the linear predictor for
> >>> the binary part:
> >>>
> >>> In zipoisson: prob = plogis(eta_2)     and meanc = exp(eta_1)
> >>> In zapoisson: prob = exp(-exp(eta_2))  and meanc =
> >>> exp(eta_1)/(1-exp(-exp(eta_1)))
> >>> In hupoisson: prob = plogis(eta_2)     and meanc =
> >>> exp(eta_1)/(1-exp(-exp(eta_1)))
> >>> In ztpoisson: prob = 0                 and meanc =
> >>> exp(eta_1)/(1-exp(-exp(eta_1)))
> >>>
> >>> In each case the linear predictor has a `fixed' part and a `random'
> >>> part which I'll denote as `a' and `u' respectively. Ideally we would
> >>> want
> >>>
> >>> E[(1-prob)*meanc] taken over u_1 & u_2
> >>>
> >>> if prob and meanc are independent this is a bit easier as
> >>>
> >>> E[y] = E[1-prob]E[meanc]
> >>>
> >>> and the two expectations ony need to taken with repsect to their
> >>> respective random effects. If we have sd_1 and sd_2 as the standard
> >>> deviations of the two sets of random effects then for the zapoisson:
> >>>
> >>>    normal.evd<-function(x, mu, v){
> >>>       exp(-exp(x))*dnorm(x, mu, sqrt(v))
> >>>    }
> >>>    normal.zt<-function(x, mu, v){
> >>>      exp(x)/(1-exp(-exp(x)))*dnorm(x, mu, sqrt(v))
> >>>    }
> >>>
> >>>    pred<-function(a_1, a_2, sd_1, sd_2){
> >>>      prob<-1-integrate(normal.evd, qnorm(0.0001, a_2,sd_2),
> >>> qnorm(0.9999, a_2,sd_2), a_2,sd_2)[[1]]
> >>>      meanc<-integrate(normal.zt, qnorm(0.0001, a_1,sd_1),
> >>> qnorm(0.9999, a_1,sd_1), a_1,sd_1)[[1]]
> >>>      prob*meanc
> >>>    }
> >>>
> >>> #  gives the expected value with reasonable accuracy.  As an example:
> >>>
> >>>    x<-rnorm(300)
> >>>    l1<-rnorm(300, 1/2+x, sqrt(1))
> >>>    l2<-rnorm(300, 1-x, sqrt(1))
> >>>
> >>>    y<-rbinom(300, 1, 1-exp(-exp(l2)))
> >>>    y[which(y==1)]<-qpois(runif(sum(y==1), dpois(0,
> >>> exp(l1[which(y==1)])), 1), exp(l1[which(y==1)]))
> >>>    # cunning sampler from Peter Dalgaard (R-sig-mixed)
> >>>
> >>>    data=data.frame(y=y, x=x)
> >>>    prior=list(R=list(V=diag(2), nu=0.002, fix=2))
> >>>
> >>>    m1<-MCMCglmm(y~trait+trait:x-1, rcov=~idh(trait):units, data=data,
> >>> family="zapoisson", prior=prior)
> >>>
> >>>    b_1<-colMeans(m1$Sol)[c(1,3)]
> >>>    b_2<-colMeans(m1$Sol)[c(2,4)]
> >>>    sd_1<-mean(sqrt(m1$VCV[,1]))
> >>>    sd_2<-mean(sqrt(m1$VCV[,2]))
> >>>
> >>>    # note it is more accurate to take the posterior mean prediction
> >>> rather than the prediction from the posterior means as I've done here,
> >>> but for illustration:
> >>>
> >>>    x.pred<-seq(-3,3,length=100)
> >>>    p<-1:100
> >>>    for(i in 1:100){
> >>>      p[i]<-pred(a_1 = b_1[1]+x.pred[i]*b_1[2], a_2 =
> >>> b_2[1]+x.pred[i]*b_2[2], sd_1=sd_1, sd_2=sd_2)
> >>>    }
> >>>
> >>>    plot(y~x)
> >>>    lines(p~x.pred)
> >>>
> >>> Cheers,
> >>>
> >>> Jarrod
> >>>
> >>>
> >>>
> >>>
> >>> Quoting Ruben Arslan <rubenarslan at gmail.com> on Tue, 17 Mar 2015
> >>> 13:33:25 +0000:
> >>>
> >>> > Dear list,
> >>> >
> >>> > I've made a reproducible example of the zero-altered prediction,
> >>> > in the hope that someone will have a look and reassure me that I'm
> going
> >>> > about this the right way.
> >>> > I was a bit confused by the point about p_i and n_i being correlated
> >>> (they
> >>> > are in my case), but I think this was a red herring for me
> >>> > since I'm not deriving the variance analytically.
> >>> > The script is here: https://gist.github.com/rubenarslan/
> >>> aeacdd306b3d061819a6
> >>> > and if you don't want to run the simulation fit yourself, I've put an
> >>> RDS
> >>> > file of the fit here:
> >>> > https://dl.dropboxusercontent.com/u/1078620/m1.rds
> >>> >
> >>> > Best regards,
> >>> >
> >>> > Ruben Arslan
> >>> >
> >>> > On Tue, Mar 10, 2015 at 1:22 PM Ruben Arslan <rubenarslan at gmail.com>
> >>> wrote:
> >>> >
> >>> >> Dear Dr Bolker,
> >>> >>
> >>> >> I'd thought about something like this, one point of asking was to
> >>> >> see whether
> >>> >> a) it's implemented already, because I'll probably make dumb
> mistakes
> >>> while
> >>> >> trying b) it's not implemented because it's a bad idea.
> >>> >> Your response and the MCMCglmm course notes make me hope that it's
> c)
> >>> not
> >>> >> implemented because nobody did yet or d) it's so simple that
> everybody
> >>> does
> >>> >> it on-the-fly.
> >>> >>
> >>> >> So I tried my hand and would appreciate corrections. I am sure
> there is
> >>> >> some screw-up or an inelegant approach in there.
> >>> >> I included code for dealing with mcmc.lists because that's what I
> have
> >>> and
> >>> >> I'm not entirely sure how I deal with them is correct either.
> >>> >>
> >>> >> I started with a zero-altered model, because those fit fastest and
> >>> >> according to the course notes have the least complex likelihood.
> >>> >> Because I know not what I do, I'm not dealing with my random
> effects at
> >>> >> all.
> >>> >>
> >>> >> I pasted a model summary below to show what I've applied the below
> >>> >> function to. The function gives the following quantiles when applied
> >>> to 19
> >>> >> chains of that model.
> >>> >>          5%         50%         95%
> >>> >> 5.431684178 5.561211207 5.690655200
> >>> >>          5%         50%         95%
> >>> >> 5.003974382 5.178192327 5.348246558
> >>> >>
> >>> >> Warm regards,
> >>> >>
> >>> >> Ruben Arslan
> >>> >>
> >>> >> HPDpredict_za = function(object, predictor) {
> >>> >>
> >>> >> if(class(object) != "MCMCglmm") {
> >>> >> if(length( object[[1]]$Residual$nrt )>1) {
> >>> >> object = lapply(object,FUN=function(x) { x$Residual$nrt<-2;x })
> >>> >> }
> >>> >> Sol = mcmc.list(lapply(object,FUN=function(x) { x$Sol}))
> >>> >> vars = colnames(Sol[[1]])
> >>> >> } else {
> >>> >> Sol = as.data.frame(object$Sol)
> >>> >> vars = names(Sol)
> >>> >> }
> >>> >> za_predictor = vars[ vars %ends_with% predictor & vars %begins_with%
> >>> >> "traitza_"]
> >>> >> za_intercept_name = vars[ ! vars %contains% ":" & vars %begins_with%
> >>> >> "traitza_"]
> >>> >>  intercept = Sol[,"(Intercept)"]
> >>> >> za_intercept = Sol[, za_intercept_name]
> >>> >> l1 = Sol[, predictor ]
> >>> >> l2 = Sol[, za_predictor ]
> >>> >> if(is.list(object)) {
> >>> >> intercept = unlist(intercept)
> >>> >> za_intercept = unlist(za_intercept)
> >>> >> l1 = unlist(l1)
> >>> >> l2 = unlist(l2)
> >>> >> }
> >>> >>  py_0 = dpois(0, exp(intercept + za_intercept))
> >>> >> y_ygt0 = exp(intercept)
> >>> >> at_intercept = (1-py_0) * y_ygt0
> >>> >>
> >>> >> py_0 = dpois(0, exp(intercept + za_intercept + l2))
> >>> >> y_ygt0 = exp(intercept +  l1)
> >>> >> at_predictor_1 = (1-py_0) * y_ygt0
> >>> >> print(qplot(at_intercept))
> >>> >> print(qplot(at_predictor_1))
> >>> >> df = data.frame("intercept" = at_intercept)
> >>> >> df[, predictor] = at_predictor_1
> >>> >> print(qplot(x=variable,
> >>> >> y=value,data=suppressMessages(melt(df)),fill=variable,alpha=
> I(0.40),
> >>> geom =
> >>> >> 'violin'))
> >>> >> print(quantile(at_intercept, probs = c(0.05,0.5,0.95)))
> >>> >> print(quantile(at_predictor_1, probs = c(0.05,0.5,0.95)))
> >>> >> invisible(df)
> >>> >> }
> >>> >>
> >>> >>
> >>> >> > summary(object[[1]])
> >>> >>
> >>> >>  Iterations = 100001:299901
> >>> >>  Thinning interval  = 100
> >>> >>  Sample size  = 2000
> >>> >>
> >>> >>  DIC: 349094
> >>> >>
> >>> >>  G-structure:  ~idh(trait):idParents
> >>> >>
> >>> >>                       post.mean l-95% CI u-95% CI eff.samp
> >>> >> children.idParents       0.0189   0.0164   0.0214     1729
> >>> >> za_children.idParents    0.2392   0.2171   0.2622     1647
> >>> >>
> >>> >>  R-structure:  ~idh(trait):units
> >>> >>
> >>> >>                   post.mean l-95% CI u-95% CI eff.samp
> >>> >> children.units        0.144    0.139    0.148     1715
> >>> >> za_children.units     1.000    1.000    1.000        0
> >>> >>
> >>> >>  Location effects: children ~ trait * (maternalage.factor +
> >>> paternalloss +
> >>> >> maternalloss + center(nr.siblings) + birth.cohort + urban + male +
> >>> >> paternalage.mean + paternalage.diff)
> >>> >>
> >>> >>                                            post.mean  l-95% CI
> u-95%
> >>> CI
> >>> >> eff.samp   pMCMC
> >>> >> (Intercept)                                 2.088717  2.073009
> >>> 2.103357
> >>> >>   2000 <0.0005 ***
> >>> >> traitza_children                           -1.933491 -1.981945
> >>> -1.887863
> >>> >>   2000 <0.0005 ***
> >>> >> maternalage.factor(14,20]                   0.007709 -0.014238
> >>> 0.027883
> >>> >>   1500   0.460
> >>> >> maternalage.factor(35,50]                   0.006350 -0.009634
> >>> 0.024107
> >>> >>   2000   0.462
> >>> >> paternallossTRUE                            0.000797 -0.022716
> >>> 0.025015
> >>> >>   2000   0.925
> >>> >> maternallossTRUE                           -0.015542 -0.040240
> >>> 0.009549
> >>> >>   2000   0.226
> >>> >> center(nr.siblings)                         0.005869  0.004302
> >>> 0.007510
> >>> >>   2000 <0.0005 ***
> >>> >> birth.cohort(1703,1722]                    -0.045487 -0.062240
> >>> -0.028965
> >>> >>   2000 <0.0005 ***
> >>> >> birth.cohort(1722,1734]                    -0.055872 -0.072856
> >>> -0.036452
> >>> >>   2000 <0.0005 ***
> >>> >> birth.cohort(1734,1743]                    -0.039770 -0.056580
> >>> -0.020907
> >>> >>   2000 <0.0005 ***
> >>> >> birth.cohort(1743,1750]                    -0.030713 -0.048301
> >>> -0.012214
> >>> >>   2000   0.002 **
> >>> >> urban                                      -0.076748 -0.093240
> >>> -0.063002
> >>> >>   2567 <0.0005 ***
> >>> >> male                                        0.106074  0.095705
> >>> 0.115742
> >>> >>   2000 <0.0005 ***
> >>> >> paternalage.mean                           -0.024119 -0.033133
> >>> -0.014444
> >>> >>   2000 <0.0005 ***
> >>> >> paternalage.diff                           -0.018367 -0.032083
> >>> -0.005721
> >>> >>   2000   0.007 **
> >>> >> traitza_children:maternalage.factor(14,20] -0.116510 -0.182432
> >>> -0.051978
> >>> >>   1876   0.001 ***
> >>> >> traitza_children:maternalage.factor(35,50] -0.045196 -0.094485
> >>> 0.002640
> >>> >>   2000   0.075 .
> >>> >> traitza_children:paternallossTRUE          -0.171957 -0.238218
> >>> -0.104820
> >>> >>   2000 <0.0005 ***
> >>> >> traitza_children:maternallossTRUE          -0.499539 -0.566825
> >>> -0.430637
> >>> >>   2000 <0.0005 ***
> >>> >> traitza_children:center(nr.siblings)       -0.023723 -0.028676
> >>> -0.018746
> >>> >>   1848 <0.0005 ***
> >>> >> traitza_children:birth.cohort(1703,1722]   -0.026012 -0.074250
> >>> 0.026024
> >>> >>   2000   0.319
> >>> >> traitza_children:birth.cohort(1722,1734]   -0.279418 -0.329462
> >>> -0.227187
> >>> >>   2000 <0.0005 ***
> >>> >> traitza_children:birth.cohort(1734,1743]   -0.260165 -0.312659
> >>> -0.204462
> >>> >>   2130 <0.0005 ***
> >>> >> traitza_children:birth.cohort(1743,1750]   -0.481457 -0.534568
> >>> -0.426648
> >>> >>   2000 <0.0005 ***
> >>> >> traitza_children:urban                     -0.604108 -0.645169
> >>> -0.562554
> >>> >>   1702 <0.0005 ***
> >>> >> traitza_children:male                      -0.414988 -0.444589
> >>> -0.387005
> >>> >>   2000 <0.0005 ***
> >>> >> traitza_children:paternalage.mean           0.006545 -0.018570
> >>> 0.036227
> >>> >>   2000   0.651
> >>> >> traitza_children:paternalage.diff          -0.097982 -0.136302
> >>> -0.060677
> >>> >>   2000 <0.0005 ***
> >>> >>
> >>> >>
> >>> >> On Mon, Mar 9, 2015 at 10:12 PM Ben Bolker <bbolker at gmail.com>
> wrote:
> >>> >>
> >>> >>> Ruben Arslan <rubenarslan at ...> writes:
> >>> >>>
> >>> >>> >
> >>> >>> > Dear list,
> >>> >>> >
> >>> >>> > I wanted to ask: Is there any (maybe just back of the envelope)
> way
> >>> to
> >>> >>> > obtain a response prediction for zero-inflated or hurdle type
> >>> models?
> >>> >>> > I've fit such models in MCMCglmm, but I don't work in ecology
> and my
> >>> >>> > previous experience with explaining such models to "my audience"
> >>> did not
> >>> >>> > bode well. When it comes to humans, the researchers I presented
> to
> >>> are
> >>> >>> not
> >>> >>> > used to offspring count being zero-inflated (or acquainted with
> that
> >>> >>> > concept), but in my historical data with high infant mortality,
> it
> >>> is
> >>> >>> (in
> >>> >>> > modern data it's actually slightly underdispersed).
> >>> >>> >
> >>> >>> > Currently I'm using lme4 and simply splitting my models into two
> >>> stages
> >>> >>> > (finding a mate and having offspring).
> >>> >>> > That's okay too, but in one population the effect of interest is
> not
> >>> >>> > clearly visible in either stage, only when both are taken
> together
> >>> (but
> >>> >>> > then the outcome is zero-inflated).
> >>> >>> > I expect to be given a hard time for this and hence thought I'd
> use
> >>> a
> >>> >>> > binomial model with the outcome offspring>0 as my main model, but
> >>> that
> >>> >>> > turns out to be hard to explain too and doesn't
> >>> >>> > really do the data justice.
> >>> >>> >
> >>> >>> > Basically I don't want to be forced to discuss my smallest
> >>> population
> >>> >>> as a
> >>> >>> > non-replication of the effect because I was insufficiently able
> to
> >>> >>> explain
> >>> >>> > the statistics behind my reasoning that the effect shows.
> >>> >>>
> >>> >>>   I think the back-of-the envelope answer would be that for a
> >>> two-stage
> >>> >>> model with a prediction of p_i for the probability of having a
> >>> non-zero
> >>> >>> response (or in the case of zero-inflated models, the probability
> of
> >>> >>> _not_ having a structural zero) and a prediction of n_i for the
> >>> >>> conditional
> >>> >>> part of the model, the mean predicted value is p_i*n_i and the
> >>> >>> variance is _approximately_ (p_i*n_i)^2*(var(p_i)/p_i^2 +
> >>> var(n_i)/n_i^2)
> >>> >>> (this is assuming
> >>> >>> that you haven't built in any correlation between p_i and n_i,
> which
> >>> >>> would be hard in lme4 but _might_ be possible under certain
> >>> circumstances
> >>> >>> via a multitype model in MCMCglmm).
> >>> >>>
> >>> >>>   Does that help?
> >>> >>>
> >>> >>> _______________________________________________
> >>> >>> R-sig-mixed-models at r-project.org mailing list
> >>> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>> >>>
> >>> >>
> >>> >
> >>> >       [[alternative HTML version deleted]]
> >>> >
> >>> > _______________________________________________
> >>> > R-sig-mixed-models at r-project.org mailing list
> >>> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>> >
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> The University of Edinburgh is a charitable body, registered in
> >>> Scotland, with registration number SC005336.
> >>>
> >>>
> >>>
> >
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
>

	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list