[R-sig-ME] confidence intervals with mvrnorm - upper value equal to inf

Tue Oct 20 00:39:02 CEST 2020

You are right, I missed the *truncated* in the nbinom2 model.
In principle, I wanted to use a hurdle model because we are talking
about how many individuals are entangled in a net, and some zeros are
caused by the present of a treatment (reducing entanglement) while
other zeros are caused by the absence of individuals in the water. I
thought a hurdle model would be appropriate.

I now tested both truncated and non truncated nbinom and poisson
models, and the non truncated poisson is the top model in terms of AIC
and its diagnostics plots are good.
            dAIC df
m1.zip   0.0  4
m1.nb   16.7 5
m1.zitp 17.7 4
m1.tnb  20.8 5

I get sensible CI values this time (pred_plot attached).
It seems to me that this model fits the data well, does it make sense
to use this instead of the hurdle model?
I have researched online about ways to compute the likelihood profile
for a zero inflated model, but I haven't found any help. Is there any
resource you'd recommend?

Going back to the Inf values, would (quasi) complete separation be
possible for models other than binomial, though? My response is not
binary, it is a count and I have some positive values different from
1.
I used the same code for the 4 model tested, and only the truncated
models (poisson and nbinom2) predict Inf CI values.

Have a nice start of the week!

Alessandra

On Thu, Oct 15, 2020 at 1:18 PM Ben Bolker <bbolker using gmail.com> wrote:
>
>
>
> On 10/15/20 4:38 PM, Alessandra Bielli wrote:
> > Thanks for the clarifications.
> >
> > So if I understand right, model diagnostics can tell me if my model
> > parameters are multivariate normal. I think they are, see the dharma
> > diagnostic plot attached.
>
>     No, that's still not evaluating the multivariate normality of the
> sampling distribution of the model parameters.  Computing the likelihood
> profile is the main way to do that.
> >
> > The values you requested:
> >   > X.cond[1,]
> >    (Intercept) Used_piNgersN
> >              1             0
> >>   beta.cond
> >    (Intercept) Used_piNgersN
> >      -19.77343      19.26693
> >> vcov(m1)$cond
> >                (Intercept) Used_piNgersN
> > (Intercept)     168344705    -168344704
> > Used_piNgersN  -168344704     168344705
> >
> > I noted that STd.Error values in the conditional model are very high
> > as well (12974.8).
> > I run the same model but set family as nbinom2
> >
> > m1 <- glmmTMB(Dolphins.TOT ~ Used_piNgers + offset(log(Effort)) +
> > (1|tripID), ziformula = ~1,   data=x, family = "nbinom2")
> >
> > and the Inf value do not appear. Was it an overdispersion problem? Can
> > I use the negative binomial model instead?
>
>
>    I would normally say this is an example of (quasi)complete separation
> <http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#penalizationhandling-complete-separation>
> (large beta values, ridiculous Wald standard errors)
>
>     But it may be overdispersion as well (since it disappears with nbinom2).
>
>     Why are you using a hurdle model in one case (truncated Poisson) and
> a zero-inflation model (*non*-truncated nbinom2) in the other?
>
>
>
>
> >
> > Sorry, I have tried to simulate my data to provide you with more
> > information but I am not sure I am doing it correctly. The effort
> > varies in each trial and I believe it can affect the value of
> > Dolphins.TOT.
> >
> > On Wed, Oct 14, 2020 at 7:24 PM Ben Bolker <bbolker using gmail.com> wrote:
> >>
> >>     A few clarifying points below.
> >>
> >> On 10/14/20 3:11 PM, Alessandra Bielli wrote:
> >>> Hi Ben
> >>>
> >>> That's a good point, my parameters are not multivariate normal.
> >>> But then, why was this method used in the Salamander example, where
> >>> data are zero inflated and so, I assume, not multivariate normal?
> >>
> >>     The multivariate normality refers to the sampling distribution of the
> >> *model parameters*, not to the distribution (marginal or conditional of
> >> the response variable).  In general the sampling distribution of the
> >> model parameters is multivariate normal if the data are "well enough
> >> behaved" (i.e., either the responses themselves are normal *or* there is
> >> enough data, and enough information in the data, for the log-likelihood
> >> to become quadratic in a sufficiently large neighborhood of the MLE ...)
> >>
> >>
> >>>
> >>> I haven't simulated the data to share yet, but the "Inf" first appears
> >>> at the line >pred.ucount.psim =
> >>> exp(pred.cond.psim)*(1-plogis(pred.zi.psim))
> >>> Only the first row, Used_piNgers=Y, shows "Inf".
> >>
> >>      Hmm.  That's a little surprising, it implies that pred.cond.psim is
> >> a large number (>700 or so, which for a log-scale parameter is huge).
> >> Can you show us X.cond[1,] , beta.cond, vcov(m1)$cond ?
> >>>
> >>> Thanks,
> >>> Alessandra
> >>>
> >>> On Sun, Oct 11, 2020 at 3:46 PM Ben Bolker <bbolker using gmail.com> wrote:
> >>>>
> >>>>      It's hard to troubleshoot this without a reproducible example. Unless
> >>>> the answer is obvious -- which it's not, to me -- the easiest way to
> >>>> troubleshoot is to work through the steps one at a time and see where
> >>>> the infinite values first appear.  Can you create such an example either
> >>>> by posting your data or by simulating data that looks like your data?
> >>>>
> >>>>      The posterior predictive simulation approach assumes that the
> >>>> sampling distributions of the parameters are multivariate normal, which
> >>>> is likely to be questionable in a low-information setting (which will be
> >>>> the case if you don't have very many non-zero values ...)
> >>>>
> >>>>
> >>>> On 10/10/20 7:28 AM, Alessandra Bielli wrote:
> >>>>> Dear list,
> >>>>>
> >>>>> I am trying to predict a value and CI for two different treatments from a
> >>>>> glmmTMB fitted model using posterior predictive simulations (mvrnorm
> >>>>> function in the MASS package) as in the Salamander example, Brooks 2017
> >>>>> appendix B
> >>>>> <https://www.biorxiv.org/content/biorxiv/suppl/2017/05/01/132753.DC1/132753-2.pdf>.
> >>>>> The dependent variable is a count, majority of values are zeros but some
> >>>>> positive values appear.
> >>>>>
> >>>>> m1 <- glmmTMB(Dolphins.TOT ~ Used_piNgers + offset(log(Effort)) +
> >>>>> (1|Trip_Code), ziformula =~1,
> >>>>>                  data=x, family = "truncated_poisson")
> >>>>>
> >>>>>        newdata0 = with(x,
> >>>>>                    expand.grid(
> >>>>>                      Used_piNgers = c("Y","N"),
> >>>>>                      Effort=1))
> >>>>>
> >>>>> X.cond = model.matrix(lme4::nobars(formula(m1)[-2]), newdata0)
> >>>>>
> >>>>> beta.cond = fixef(m1)$cond
> >>>>> pred.cond = X.cond %*% beta.cond
> >>>>> ziformula = m1$modelInfo$allForm$ziformula
> >>>>> X.zi = model.matrix(lme4::nobars(ziformula), newdata0)
> >>>>> beta.zi = fixef(m1)$zi
> >>>>> pred.zi = X.zi %*% beta.zi
> >>>>>
> >>>>> pred.ucount = exp(pred.cond)*(1-plogis(pred.zi))
> >>>>>
> >>>>>
> >>>>> set.seed(101)
> >>>>> pred.condpar.psim = mvrnorm(1000,mu=beta.cond,Sigma=vcov(m1)$cond)
> >>>>> pred.cond.psim = X.cond %*% t(pred.condpar.psim)
> >>>>> pred.zipar.psim = mvrnorm(1000,mu=beta.zi,Sigma=vcov(m1)$zi)
> >>>>> pred.zi.psim = X.zi %*% t(pred.zipar.psim)
> >>>>> pred.ucount.psim = exp(pred.cond.psim)*(1-plogis(pred.zi.psim))
> >>>>> ci.ucount = t(apply(pred.ucount.psim,1,quantile,c(0.025,0.975)))
> >>>>> ci.ucount = data.frame(ci.ucount)
> >>>>> names(ci.ucount) = c("ucount.low","ucount.high")
> >>>>> pred.ucount = data.frame(newdata0, pred.ucount, ci.ucount)
> >>>>>
> >>>>> For my upper CI, I get a value equal to Inf:
> >>>>>
> >>>>> Used_piNgers Effort  pred.ucount ucount.low ucount.high
> >>>>> 1            Y      1 6.758889e-11 0.00000000         Inf
> >>>>> 2            N      1 1.575418e-02 0.00223033   0.1096139
> >>>>>
> >>>>> Is the Inf caused by the very low variability of values in my dataset? I
> >>>>> tried to lower the upper bound of the CI ci.ucount =
> >>>>> t(apply(pred.ucount.psim,1,quantile,c(0.025,0.975))) and only when reaching
> >>>>> 0.475 ci.ucount = t(apply(pred.ucount.psim,1,quantile,c(0.025,0.475))) I
> >>>>> obtained:
> >>>>>
> >>>>> Used_piNgers Effort  pred.ucount ucount.low  ucount.high
> >>>>> 1            Y      1 6.758889e-11 0.00000000 7.117465e+12
> >>>>> 2            N      1 1.575418e-02 0.00223033 1.454579e-02
> >>>>>
> >>>>> I found a related post
> >>>>> <https://stackoverflow.com/questions/38272798/bootstrap-confidence-interval-with-inf-in-final-estimates-boot-dplyr-package>
> >>>>> but
> >>>>> the explanation is not clear to me. I would like to publish these results
> >>>>> and I would like to know:
> >>>>>
> >>>>>       1. is this a sign that something is wrong? if yes, what is it?
> >>>>>       2. if nothing is wrong, what does the Inf mean and what's the best way
> >>>>>       to report it and plot it in a publication?
> >>>>>
> >>>>>
> >>>>> I also posted this question on Cross validated
> >>>>> https://stats.stackexchange.com/questions/491196/bootstrap-confidence-interval-with-mvrnorm-upper-value-equal-to-inf
> >>>>>
> >>>>> Thanks,
> >>>>> Alessandra
> >>>>>
> >>>>>         [[alternative HTML version deleted]]
> >>>>>
> >>>>> _______________________________________________
> >>>>> R-sig-mixed-models using r-project.org mailing list
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>>>>
> >>>>
> >>>> _______________________________________________
> >>>> R-sig-mixed-models using r-project.org mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-------------- next part --------------
A non-text attachment was scrubbed...
Name: pred_plot.png
Type: image/png
Size: 5053 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20201019/ab1edfbf/attachment.png>