[R-sig-ME] confidence intervals with mvrnorm - upper value equal to inf

Thu Oct 15 21:18:05 CEST 2020


On 10/15/20 4:38 PM, Alessandra Bielli wrote:
> Thanks for the clarifications.
> 
> So if I understand right, model diagnostics can tell me if my model
> parameters are multivariate normal. I think they are, see the dharma
> diagnostic plot attached.

    No, that's still not evaluating the multivariate normality of the 
sampling distribution of the model parameters.  Computing the likelihood 
profile is the main way to do that.
> 
> The values you requested:
>   > X.cond[1,]
>    (Intercept) Used_piNgersN
>              1             0
>>   beta.cond
>    (Intercept) Used_piNgersN
>      -19.77343      19.26693
>> vcov(m1)$cond
>                (Intercept) Used_piNgersN
> (Intercept)     168344705    -168344704
> Used_piNgersN  -168344704     168344705
> 
> I noted that STd.Error values in the conditional model are very high
> as well (12974.8).
> I run the same model but set family as nbinom2
> 
> m1 <- glmmTMB(Dolphins.TOT ~ Used_piNgers + offset(log(Effort)) +
> (1|tripID), ziformula = ~1,   data=x, family = "nbinom2")
> 
> and the Inf value do not appear. Was it an overdispersion problem? Can
> I use the negative binomial model instead?


   I would normally say this is an example of (quasi)complete separation 
<http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#penalizationhandling-complete-separation> 
(large beta values, ridiculous Wald standard errors)

    But it may be overdispersion as well (since it disappears with nbinom2).

    Why are you using a hurdle model in one case (truncated Poisson) and 
a zero-inflation model (*non*-truncated nbinom2) in the other?


> 
> Sorry, I have tried to simulate my data to provide you with more
> information but I am not sure I am doing it correctly. The effort
> varies in each trial and I believe it can affect the value of
> Dolphins.TOT.
> 
> On Wed, Oct 14, 2020 at 7:24 PM Ben Bolker <bbolker using gmail.com> wrote:
>>
>>     A few clarifying points below.
>>
>> On 10/14/20 3:11 PM, Alessandra Bielli wrote:
>>> Hi Ben
>>>
>>> That's a good point, my parameters are not multivariate normal.
>>> But then, why was this method used in the Salamander example, where
>>> data are zero inflated and so, I assume, not multivariate normal?
>>
>>     The multivariate normality refers to the sampling distribution of the
>> *model parameters*, not to the distribution (marginal or conditional of
>> the response variable).  In general the sampling distribution of the
>> model parameters is multivariate normal if the data are "well enough
>> behaved" (i.e., either the responses themselves are normal *or* there is
>> enough data, and enough information in the data, for the log-likelihood
>> to become quadratic in a sufficiently large neighborhood of the MLE ...)
>>
>>
>>>
>>> I haven't simulated the data to share yet, but the "Inf" first appears
>>> at the line >pred.ucount.psim =
>>> exp(pred.cond.psim)*(1-plogis(pred.zi.psim))
>>> Only the first row, Used_piNgers=Y, shows "Inf".
>>
>>      Hmm.  That's a little surprising, it implies that pred.cond.psim is
>> a large number (>700 or so, which for a log-scale parameter is huge).
>> Can you show us X.cond[1,] , beta.cond, vcov(m1)$cond ?
>>>
>>> Thanks,
>>> Alessandra
>>>
>>> On Sun, Oct 11, 2020 at 3:46 PM Ben Bolker <bbolker using gmail.com> wrote:
>>>>
>>>>      It's hard to troubleshoot this without a reproducible example. Unless
>>>> the answer is obvious -- which it's not, to me -- the easiest way to
>>>> troubleshoot is to work through the steps one at a time and see where
>>>> the infinite values first appear.  Can you create such an example either
>>>> by posting your data or by simulating data that looks like your data?
>>>>
>>>>      The posterior predictive simulation approach assumes that the
>>>> sampling distributions of the parameters are multivariate normal, which
>>>> is likely to be questionable in a low-information setting (which will be
>>>> the case if you don't have very many non-zero values ...)
>>>>
>>>>
>>>> On 10/10/20 7:28 AM, Alessandra Bielli wrote:
>>>>> Dear list,
>>>>>
>>>>> I am trying to predict a value and CI for two different treatments from a
>>>>> glmmTMB fitted model using posterior predictive simulations (mvrnorm
>>>>> function in the MASS package) as in the Salamander example, Brooks 2017
>>>>> appendix B
>>>>> <https://www.biorxiv.org/content/biorxiv/suppl/2017/05/01/132753.DC1/132753-2.pdf>.
>>>>> The dependent variable is a count, majority of values are zeros but some
>>>>> positive values appear.
>>>>>
>>>>> m1 <- glmmTMB(Dolphins.TOT ~ Used_piNgers + offset(log(Effort)) +
>>>>> (1|Trip_Code), ziformula =~1,
>>>>>                  data=x, family = "truncated_poisson")
>>>>>
>>>>>        newdata0 = with(x,
>>>>>                    expand.grid(
>>>>>                      Used_piNgers = c("Y","N"),
>>>>>                      Effort=1))
>>>>>
>>>>> X.cond = model.matrix(lme4::nobars(formula(m1)[-2]), newdata0)
>>>>>
>>>>> beta.cond = fixef(m1)$cond
>>>>> pred.cond = X.cond %*% beta.cond
>>>>> ziformula = m1$modelInfo$allForm$ziformula
>>>>> X.zi = model.matrix(lme4::nobars(ziformula), newdata0)
>>>>> beta.zi = fixef(m1)$zi
>>>>> pred.zi = X.zi %*% beta.zi
>>>>>
>>>>> pred.ucount = exp(pred.cond)*(1-plogis(pred.zi))
>>>>>
>>>>>
>>>>> set.seed(101)
>>>>> pred.condpar.psim = mvrnorm(1000,mu=beta.cond,Sigma=vcov(m1)$cond)
>>>>> pred.cond.psim = X.cond %*% t(pred.condpar.psim)
>>>>> pred.zipar.psim = mvrnorm(1000,mu=beta.zi,Sigma=vcov(m1)$zi)
>>>>> pred.zi.psim = X.zi %*% t(pred.zipar.psim)
>>>>> pred.ucount.psim = exp(pred.cond.psim)*(1-plogis(pred.zi.psim))
>>>>> ci.ucount = t(apply(pred.ucount.psim,1,quantile,c(0.025,0.975)))
>>>>> ci.ucount = data.frame(ci.ucount)
>>>>> names(ci.ucount) = c("ucount.low","ucount.high")
>>>>> pred.ucount = data.frame(newdata0, pred.ucount, ci.ucount)
>>>>>
>>>>> For my upper CI, I get a value equal to Inf:
>>>>>
>>>>> Used_piNgers Effort  pred.ucount ucount.low ucount.high
>>>>> 1            Y      1 6.758889e-11 0.00000000         Inf
>>>>> 2            N      1 1.575418e-02 0.00223033   0.1096139
>>>>>
>>>>> Is the Inf caused by the very low variability of values in my dataset? I
>>>>> tried to lower the upper bound of the CI ci.ucount =
>>>>> t(apply(pred.ucount.psim,1,quantile,c(0.025,0.975))) and only when reaching
>>>>> 0.475 ci.ucount = t(apply(pred.ucount.psim,1,quantile,c(0.025,0.475))) I
>>>>> obtained:
>>>>>
>>>>> Used_piNgers Effort  pred.ucount ucount.low  ucount.high
>>>>> 1            Y      1 6.758889e-11 0.00000000 7.117465e+12
>>>>> 2            N      1 1.575418e-02 0.00223033 1.454579e-02
>>>>>
>>>>> I found a related post
>>>>> <https://stackoverflow.com/questions/38272798/bootstrap-confidence-interval-with-inf-in-final-estimates-boot-dplyr-package>
>>>>> but
>>>>> the explanation is not clear to me. I would like to publish these results
>>>>> and I would like to know:
>>>>>
>>>>>       1. is this a sign that something is wrong? if yes, what is it?
>>>>>       2. if nothing is wrong, what does the Inf mean and what's the best way
>>>>>       to report it and plot it in a publication?
>>>>>
>>>>>
>>>>> I also posted this question on Cross validated
>>>>> https://stats.stackexchange.com/questions/491196/bootstrap-confidence-interval-with-mvrnorm-upper-value-equal-to-inf
>>>>>
>>>>> Thanks,
>>>>> Alessandra
>>>>>
>>>>>         [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-mixed-models using r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models using r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models