[R-sig-ME] confidence intervals with mvrnorm - upper value equal to inf

Thu Oct 15 22:38:01 CEST 2020

Thanks for the clarifications.

So if I understand right, model diagnostics can tell me if my model
parameters are multivariate normal. I think they are, see the dharma
diagnostic plot attached.

The values you requested:
 > X.cond[1,]
  (Intercept) Used_piNgersN
            1             0
>  beta.cond
  (Intercept) Used_piNgersN
    -19.77343      19.26693
> vcov(m1)$cond
              (Intercept) Used_piNgersN
(Intercept)     168344705    -168344704
Used_piNgersN  -168344704     168344705

I noted that STd.Error values in the conditional model are very high
as well (12974.8).
I run the same model but set family as nbinom2

m1 <- glmmTMB(Dolphins.TOT ~ Used_piNgers + offset(log(Effort)) +
(1|tripID), ziformula = ~1,   data=x, family = "nbinom2")

and the Inf value do not appear. Was it an overdispersion problem? Can
I use the negative binomial model instead?

Sorry, I have tried to simulate my data to provide you with more
information but I am not sure I am doing it correctly. The effort
varies in each trial and I believe it can affect the value of
Dolphins.TOT.

On Wed, Oct 14, 2020 at 7:24 PM Ben Bolker <bbolker using gmail.com> wrote:
>
>    A few clarifying points below.
>
> On 10/14/20 3:11 PM, Alessandra Bielli wrote:
> > Hi Ben
> >
> > That's a good point, my parameters are not multivariate normal.
> > But then, why was this method used in the Salamander example, where
> > data are zero inflated and so, I assume, not multivariate normal?
>
>    The multivariate normality refers to the sampling distribution of the
> *model parameters*, not to the distribution (marginal or conditional of
> the response variable).  In general the sampling distribution of the
> model parameters is multivariate normal if the data are "well enough
> behaved" (i.e., either the responses themselves are normal *or* there is
> enough data, and enough information in the data, for the log-likelihood
> to become quadratic in a sufficiently large neighborhood of the MLE ...)
>
>
> >
> > I haven't simulated the data to share yet, but the "Inf" first appears
> > at the line >pred.ucount.psim =
> > exp(pred.cond.psim)*(1-plogis(pred.zi.psim))
> > Only the first row, Used_piNgers=Y, shows "Inf".
>
>     Hmm.  That's a little surprising, it implies that pred.cond.psim is
> a large number (>700 or so, which for a log-scale parameter is huge).
> Can you show us X.cond[1,] , beta.cond, vcov(m1)$cond ?
> >
> > Thanks,
> > Alessandra
> >
> > On Sun, Oct 11, 2020 at 3:46 PM Ben Bolker <bbolker using gmail.com> wrote:
> >>
> >>     It's hard to troubleshoot this without a reproducible example. Unless
> >> the answer is obvious -- which it's not, to me -- the easiest way to
> >> troubleshoot is to work through the steps one at a time and see where
> >> the infinite values first appear.  Can you create such an example either
> >> by posting your data or by simulating data that looks like your data?
> >>
> >>     The posterior predictive simulation approach assumes that the
> >> sampling distributions of the parameters are multivariate normal, which
> >> is likely to be questionable in a low-information setting (which will be
> >> the case if you don't have very many non-zero values ...)
> >>
> >>
> >> On 10/10/20 7:28 AM, Alessandra Bielli wrote:
> >>> Dear list,
> >>>
> >>> I am trying to predict a value and CI for two different treatments from a
> >>> glmmTMB fitted model using posterior predictive simulations (mvrnorm
> >>> function in the MASS package) as in the Salamander example, Brooks 2017
> >>> appendix B
> >>> <https://www.biorxiv.org/content/biorxiv/suppl/2017/05/01/132753.DC1/132753-2.pdf>.
> >>> The dependent variable is a count, majority of values are zeros but some
> >>> positive values appear.
> >>>
> >>> m1 <- glmmTMB(Dolphins.TOT ~ Used_piNgers + offset(log(Effort)) +
> >>> (1|Trip_Code), ziformula =~1,
> >>>                 data=x, family = "truncated_poisson")
> >>>
> >>>       newdata0 = with(x,
> >>>                   expand.grid(
> >>>                     Used_piNgers = c("Y","N"),
> >>>                     Effort=1))
> >>>
> >>> X.cond = model.matrix(lme4::nobars(formula(m1)[-2]), newdata0)
> >>>
> >>> beta.cond = fixef(m1)$cond
> >>> pred.cond = X.cond %*% beta.cond
> >>> ziformula = m1$modelInfo$allForm$ziformula
> >>> X.zi = model.matrix(lme4::nobars(ziformula), newdata0)
> >>> beta.zi = fixef(m1)$zi
> >>> pred.zi = X.zi %*% beta.zi
> >>>
> >>> pred.ucount = exp(pred.cond)*(1-plogis(pred.zi))
> >>>
> >>>
> >>> set.seed(101)
> >>> pred.condpar.psim = mvrnorm(1000,mu=beta.cond,Sigma=vcov(m1)$cond)
> >>> pred.cond.psim = X.cond %*% t(pred.condpar.psim)
> >>> pred.zipar.psim = mvrnorm(1000,mu=beta.zi,Sigma=vcov(m1)$zi)
> >>> pred.zi.psim = X.zi %*% t(pred.zipar.psim)
> >>> pred.ucount.psim = exp(pred.cond.psim)*(1-plogis(pred.zi.psim))
> >>> ci.ucount = t(apply(pred.ucount.psim,1,quantile,c(0.025,0.975)))
> >>> ci.ucount = data.frame(ci.ucount)
> >>> names(ci.ucount) = c("ucount.low","ucount.high")
> >>> pred.ucount = data.frame(newdata0, pred.ucount, ci.ucount)
> >>>
> >>> For my upper CI, I get a value equal to Inf:
> >>>
> >>> Used_piNgers Effort  pred.ucount ucount.low ucount.high
> >>> 1            Y      1 6.758889e-11 0.00000000         Inf
> >>> 2            N      1 1.575418e-02 0.00223033   0.1096139
> >>>
> >>> Is the Inf caused by the very low variability of values in my dataset? I
> >>> tried to lower the upper bound of the CI ci.ucount =
> >>> t(apply(pred.ucount.psim,1,quantile,c(0.025,0.975))) and only when reaching
> >>> 0.475 ci.ucount = t(apply(pred.ucount.psim,1,quantile,c(0.025,0.475))) I
> >>> obtained:
> >>>
> >>> Used_piNgers Effort  pred.ucount ucount.low  ucount.high
> >>> 1            Y      1 6.758889e-11 0.00000000 7.117465e+12
> >>> 2            N      1 1.575418e-02 0.00223033 1.454579e-02
> >>>
> >>> I found a related post
> >>> <https://stackoverflow.com/questions/38272798/bootstrap-confidence-interval-with-inf-in-final-estimates-boot-dplyr-package>
> >>> but
> >>> the explanation is not clear to me. I would like to publish these results
> >>> and I would like to know:
> >>>
> >>>      1. is this a sign that something is wrong? if yes, what is it?
> >>>      2. if nothing is wrong, what does the Inf mean and what's the best way
> >>>      to report it and plot it in a publication?
> >>>
> >>>
> >>> I also posted this question on Cross validated
> >>> https://stats.stackexchange.com/questions/491196/bootstrap-confidence-interval-with-mvrnorm-upper-value-equal-to-inf
> >>>
> >>> Thanks,
> >>> Alessandra
> >>>
> >>>        [[alternative HTML version deleted]]
> >>>
> >>> _______________________________________________
> >>> R-sig-mixed-models using r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>>
> >>
> >> _______________________________________________
> >> R-sig-mixed-models using r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models