[R-sig-ME] confidence intervals with mvrnorm - upper value equal to inf

Mon Oct 19 17:26:47 CEST 2020


On 10/19/20 6:39 PM, Alessandra Bielli wrote:
> You are right, I missed the *truncated* in the nbinom2 model.
> In principle, I wanted to use a hurdle model because we are talking
> about how many individuals are entangled in a net, and some zeros are
> caused by the present of a treatment (reducing entanglement) while
> other zeros are caused by the absence of individuals in the water. I
> thought a hurdle model would be appropriate.

    Honestly, this sounds to me more like a zero-inflated model than a 
hurdle model to me (ZI is appropriate when the zero observations are a 
*mixture* of different sources, 'structural' (absence of individuals, 
obs will always be zero) and 'sampling' (individuals are there but 
sometimes none are caught; zeros from this source are more frequent when 
the conditional catch rate is low due to the treatment).


> 
> I now tested both truncated and non truncated nbinom and poisson
> models, and the non truncated poisson is the top model in terms of AIC
> and its diagnostics plots are good.
>              dAIC df
> m1.zip   0.0  4
> m1.nb   16.7 5
> m1.zitp 17.7 4
> m1.tnb  20.8 5

    It's a little weird that the AIC differences are so big; if you were 
comparing e.g. ZIP to ZINB, the ZINB should be at worst 2 AIC units 
higher (since it uses one additional parameter [unless dispformula is 
being used] and the log-likelihood can't be lower)

> 
> I get sensible CI values this time (pred_plot attached).
> It seems to me that this model fits the data well, does it make sense
> to use this instead of the hurdle model?

  Sure (to me)

> I have researched online about ways to compute the likelihood profile
> for a zero inflated model, but I haven't found any help. Is there any
> resource you'd recommend?
> 
> Going back to the Inf values, would (quasi) complete separation be
> possible for models other than binomial, though? My response is not
> binary, it is a count and I have some positive values different from
> 1.
> I used the same code for the 4 model tested, and only the truncated
> models (poisson and nbinom2) predict Inf CI values.

   Something like complete separation could be possible, (1) because the 
zero-inflation gives a binary component; (2) if one treatment 
combination has all zeros in the Poisson component, then the log-mean is 
-Inf.

> 
> Have a nice start of the week!
> 
> Alessandra
> 
> 
> On Thu, Oct 15, 2020 at 1:18 PM Ben Bolker <bbolker using gmail.com> wrote:
>>
>>
>>
>> On 10/15/20 4:38 PM, Alessandra Bielli wrote:
>>> Thanks for the clarifications.
>>>
>>> So if I understand right, model diagnostics can tell me if my model
>>> parameters are multivariate normal. I think they are, see the dharma
>>> diagnostic plot attached.
>>
>>      No, that's still not evaluating the multivariate normality of the
>> sampling distribution of the model parameters.  Computing the likelihood
>> profile is the main way to do that.
>>>
>>> The values you requested:
>>>    > X.cond[1,]
>>>     (Intercept) Used_piNgersN
>>>               1             0
>>>>    beta.cond
>>>     (Intercept) Used_piNgersN
>>>       -19.77343      19.26693
>>>> vcov(m1)$cond
>>>                 (Intercept) Used_piNgersN
>>> (Intercept)     168344705    -168344704
>>> Used_piNgersN  -168344704     168344705
>>>
>>> I noted that STd.Error values in the conditional model are very high
>>> as well (12974.8).
>>> I run the same model but set family as nbinom2
>>>
>>> m1 <- glmmTMB(Dolphins.TOT ~ Used_piNgers + offset(log(Effort)) +
>>> (1|tripID), ziformula = ~1,   data=x, family = "nbinom2")
>>>
>>> and the Inf value do not appear. Was it an overdispersion problem? Can
>>> I use the negative binomial model instead?
>>
>>
>>     I would normally say this is an example of (quasi)complete separation
>> <http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#penalizationhandling-complete-separation>
>> (large beta values, ridiculous Wald standard errors)
>>
>>      But it may be overdispersion as well (since it disappears with nbinom2).
>>
>>      Why are you using a hurdle model in one case (truncated Poisson) and
>> a zero-inflation model (*non*-truncated nbinom2) in the other?
>>
>>
>>
>>
>>>
>>> Sorry, I have tried to simulate my data to provide you with more
>>> information but I am not sure I am doing it correctly. The effort
>>> varies in each trial and I believe it can affect the value of
>>> Dolphins.TOT.
>>>
>>> On Wed, Oct 14, 2020 at 7:24 PM Ben Bolker <bbolker using gmail.com> wrote:
>>>>
>>>>      A few clarifying points below.
>>>>
>>>> On 10/14/20 3:11 PM, Alessandra Bielli wrote:
>>>>> Hi Ben
>>>>>
>>>>> That's a good point, my parameters are not multivariate normal.
>>>>> But then, why was this method used in the Salamander example, where
>>>>> data are zero inflated and so, I assume, not multivariate normal?
>>>>
>>>>      The multivariate normality refers to the sampling distribution of the
>>>> *model parameters*, not to the distribution (marginal or conditional of
>>>> the response variable).  In general the sampling distribution of the
>>>> model parameters is multivariate normal if the data are "well enough
>>>> behaved" (i.e., either the responses themselves are normal *or* there is
>>>> enough data, and enough information in the data, for the log-likelihood
>>>> to become quadratic in a sufficiently large neighborhood of the MLE ...)
>>>>
>>>>
>>>>>
>>>>> I haven't simulated the data to share yet, but the "Inf" first appears
>>>>> at the line >pred.ucount.psim =
>>>>> exp(pred.cond.psim)*(1-plogis(pred.zi.psim))
>>>>> Only the first row, Used_piNgers=Y, shows "Inf".
>>>>
>>>>       Hmm.  That's a little surprising, it implies that pred.cond.psim is
>>>> a large number (>700 or so, which for a log-scale parameter is huge).
>>>> Can you show us X.cond[1,] , beta.cond, vcov(m1)$cond ?
>>>>>
>>>>> Thanks,
>>>>> Alessandra
>>>>>
>>>>> On Sun, Oct 11, 2020 at 3:46 PM Ben Bolker <bbolker using gmail.com> wrote:
>>>>>>
>>>>>>       It's hard to troubleshoot this without a reproducible example. Unless
>>>>>> the answer is obvious -- which it's not, to me -- the easiest way to
>>>>>> troubleshoot is to work through the steps one at a time and see where
>>>>>> the infinite values first appear.  Can you create such an example either
>>>>>> by posting your data or by simulating data that looks like your data?
>>>>>>
>>>>>>       The posterior predictive simulation approach assumes that the
>>>>>> sampling distributions of the parameters are multivariate normal, which
>>>>>> is likely to be questionable in a low-information setting (which will be
>>>>>> the case if you don't have very many non-zero values ...)
>>>>>>
>>>>>>
>>>>>> On 10/10/20 7:28 AM, Alessandra Bielli wrote:
>>>>>>> Dear list,
>>>>>>>
>>>>>>> I am trying to predict a value and CI for two different treatments from a
>>>>>>> glmmTMB fitted model using posterior predictive simulations (mvrnorm
>>>>>>> function in the MASS package) as in the Salamander example, Brooks 2017
>>>>>>> appendix B
>>>>>>> <https://www.biorxiv.org/content/biorxiv/suppl/2017/05/01/132753.DC1/132753-2.pdf>.
>>>>>>> The dependent variable is a count, majority of values are zeros but some
>>>>>>> positive values appear.
>>>>>>>
>>>>>>> m1 <- glmmTMB(Dolphins.TOT ~ Used_piNgers + offset(log(Effort)) +
>>>>>>> (1|Trip_Code), ziformula =~1,
>>>>>>>                   data=x, family = "truncated_poisson")
>>>>>>>
>>>>>>>         newdata0 = with(x,
>>>>>>>                     expand.grid(
>>>>>>>                       Used_piNgers = c("Y","N"),
>>>>>>>                       Effort=1))
>>>>>>>
>>>>>>> X.cond = model.matrix(lme4::nobars(formula(m1)[-2]), newdata0)
>>>>>>>
>>>>>>> beta.cond = fixef(m1)$cond
>>>>>>> pred.cond = X.cond %*% beta.cond
>>>>>>> ziformula = m1$modelInfo$allForm$ziformula
>>>>>>> X.zi = model.matrix(lme4::nobars(ziformula), newdata0)
>>>>>>> beta.zi = fixef(m1)$zi
>>>>>>> pred.zi = X.zi %*% beta.zi
>>>>>>>
>>>>>>> pred.ucount = exp(pred.cond)*(1-plogis(pred.zi))
>>>>>>>
>>>>>>>
>>>>>>> set.seed(101)
>>>>>>> pred.condpar.psim = mvrnorm(1000,mu=beta.cond,Sigma=vcov(m1)$cond)
>>>>>>> pred.cond.psim = X.cond %*% t(pred.condpar.psim)
>>>>>>> pred.zipar.psim = mvrnorm(1000,mu=beta.zi,Sigma=vcov(m1)$zi)
>>>>>>> pred.zi.psim = X.zi %*% t(pred.zipar.psim)
>>>>>>> pred.ucount.psim = exp(pred.cond.psim)*(1-plogis(pred.zi.psim))
>>>>>>> ci.ucount = t(apply(pred.ucount.psim,1,quantile,c(0.025,0.975)))
>>>>>>> ci.ucount = data.frame(ci.ucount)
>>>>>>> names(ci.ucount) = c("ucount.low","ucount.high")
>>>>>>> pred.ucount = data.frame(newdata0, pred.ucount, ci.ucount)
>>>>>>>
>>>>>>> For my upper CI, I get a value equal to Inf:
>>>>>>>
>>>>>>> Used_piNgers Effort  pred.ucount ucount.low ucount.high
>>>>>>> 1            Y      1 6.758889e-11 0.00000000         Inf
>>>>>>> 2            N      1 1.575418e-02 0.00223033   0.1096139
>>>>>>>
>>>>>>> Is the Inf caused by the very low variability of values in my dataset? I
>>>>>>> tried to lower the upper bound of the CI ci.ucount =
>>>>>>> t(apply(pred.ucount.psim,1,quantile,c(0.025,0.975))) and only when reaching
>>>>>>> 0.475 ci.ucount = t(apply(pred.ucount.psim,1,quantile,c(0.025,0.475))) I
>>>>>>> obtained:
>>>>>>>
>>>>>>> Used_piNgers Effort  pred.ucount ucount.low  ucount.high
>>>>>>> 1            Y      1 6.758889e-11 0.00000000 7.117465e+12
>>>>>>> 2            N      1 1.575418e-02 0.00223033 1.454579e-02
>>>>>>>
>>>>>>> I found a related post
>>>>>>> <https://stackoverflow.com/questions/38272798/bootstrap-confidence-interval-with-inf-in-final-estimates-boot-dplyr-package>
>>>>>>> but
>>>>>>> the explanation is not clear to me. I would like to publish these results
>>>>>>> and I would like to know:
>>>>>>>
>>>>>>>        1. is this a sign that something is wrong? if yes, what is it?
>>>>>>>        2. if nothing is wrong, what does the Inf mean and what's the best way
>>>>>>>        to report it and plot it in a publication?
>>>>>>>
>>>>>>>
>>>>>>> I also posted this question on Cross validated
>>>>>>> https://stats.stackexchange.com/questions/491196/bootstrap-confidence-interval-with-mvrnorm-upper-value-equal-to-inf
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Alessandra
>>>>>>>
>>>>>>>          [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> R-sig-mixed-models using r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> R-sig-mixed-models using r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models