[R-sig-ME] Beta-Binomial Model Question

Sat Sep 4 22:46:45 CEST 2021

On 9/4/21 4:28 PM, Alex Waldman wrote:
> Thanks this is helpful!
> 
> 1. The larger denominator is just inherent to the different regions (the cervical, thoracic, and lumbar spinal cord are different sizes) and won't affect the precision of the answer. It is just used to normalize the lesional area so that I can garner an understanding of the differences between regions.

  OK.
> 
> 2. I do have a large proportion of 0s but not 1s (I don’t even think 1s appear in the data). 0s represent a distinct category (ie cases that had no lesions at that particular region or of a particular lesion type).
> 
> Given those clarifications:
> 
> 1. Would it be appropriate to directly use the tabulated proportions (ie lesional area/total area) in a beta model?

   Yes.

> 2. Would transforming the data and using the beta model without zero inflation or using the zero-inflated beta be preferred? Is there a way to test this by detecting how much zero-inflation is actually present?

   By "transforming" do you mean computing the proportions?

   If you have zeros in your data and want to fit a model, you *have* to 
use zero-inflation; the likelihood of a zero response under a Beta model 
is either zero (-infinity for log-likelihood) or infinite, except in the 
special case where shape1 = 1 ...

   A zero-inflated Beta model is essentially two separate models fitted 
together for convenience (a hurdle model); that is, prob(zero or 
not-zero) is fitted with a binomial/logistic model, and the conditional 
distribution (i.e., fitting the model to the non-zero values only) is 
fitted with a Beta model.  In other words, since there are no zeros in a 
Beta distribution, a zero in the data is by definition "inflated" (a 
structural zero, sometimes called a "true zero" although I hate that 
terminology).

    "How much zero-inflation is actually present" is just the proportion 
of zeros in the data ... as long as you have any zeros in your data, it 
doesn't really make sense to test "whether" the model is zero-inflated.

> 3. If I do move forward with the zero-inflated beta model, is it possible to test what terms should be included in the zero-inflated part of the model?

   Absolutely.  Your model would be

   glmmTMB(response ~ <stuff>, ziformula = ~ <other stuff>, ...)

  And you can test the effects of different predictors on the 
probability of a zero response in any of the usual ways (Wald p-values 
in summary(), comparing nested models via LRT/anova(), profile 
confidence intervals ...)
> 
> Thanks again for your help!
> 
> Warm Regards,
> Alex
> 
> On 9/4/21, 2:33 AM, "R-sig-mixed-models on behalf of Ben Bolker" <r-sig-mixed-models-bounces using r-project.org on behalf of bbolker using gmail.com> wrote:
> 
>         This is an interesting case.
> 
>         Beta-binomial models are *not* really appropriate for non-integer
>      data; you'd be better off with a straight Beta model (see e.g. Smithson
>      and Verkuilen "A better lemon squeezer" 2006).
> 
>         However, the Beta distribution has some disadvantages:
> 
>        * you have to think about how to incorporate the effects of the
>      denominator in the model.  Does having a larger denominator affect the
>      precision of the answer in an obvious way?  I can't immediately think of
>      a sensible way to use an offset as one would in e.g. a Poisson model
>      (where you would include the total counts rather than counts/area, then
>      include log(area) in the model as an offset).
> 
>        * The Beta doesn't incorporate zero and one counts naturally. glmmTMB
>      allows zero-inflation, but not zero-one inflation (brms will do this for
>      mixed models; zoib will for non-mixed models).  Do you have a large
>      proportion of zeros and ones? (If not, you can 'squeeze' them in a bit
>      as in Smithson & Verkuilen) Do 0/1 values represent censoring (below a
>      threshold for distinguishing from 0/1, i.e. measurement resolution) or a
>      distinct category?
> 
>      The fact that you got very negative zero-inflation values suggests that
>      the model doesn't need them, but that will change when you go from
>      Beta-binomial to Beta (at which point *all* zeroes will be 'structural'
>      zeroes; if you use an intercept-only Z-I model (~1), this will basically
>      just be an estimate of the fraction of zeros in the data).
> 
>        There is an experimental diagnose() function that's supposed to help
>      you interpret problems with the model fit ...
> 
> 
> 
>      On 9/3/21 8:43 AM, Alex Waldman wrote:
>      > Dear All,
>      >
>      > My apologies for the additional question. I am working with data in which I have information on the proportional area taken by a lesion in different locations (region 1, 2, and 3) stratified by different lesion types (lesion type 1, 2, and 3). The denominator to derive the proportion will be different for each of the different locations. The data includes 0 and 1. Therefore, I was thinking of using a beta-binomial model. However, in my formula I included the proportion information as the response:
>      >
>      > glmmTMB::glmmTMB(LesionAreaRatio ~ Location*LesionType + (1 | ID), family=betabinomial, data=total_data_staged, REML=TRUE, control=glmmTMBControl(optimizer=optim, optArgs=list(method="BFGS")))
>      >
>      > I then got the following warnings:
>      >
>      >
>      >    1.  In eval(family$initialize) : non-integer #successes in a binomial glm!
>      >    2.  In fitTMB(TMBStruc) : Model convergence problem; extreme or very small eigenvalues detected. See vignette('troubleshooting')
>      >
>      > I looked in the vignette and this made we wonder if this would be the right model type to use since the proportions are not success/failure data per se but rather represent a normalized area?
>      >
>      > In addition, my data seems to be skewed toward 0s and I thought zero-inflation may be appropriate. When trying varying zero-inflation formulas I consistently received the following error (the eigenvalue error went away):
>      >
>      > In fitTMB(TMBStruc) :
>      > Model convergence problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
>      >
>      > I looked in the vignette and noticed that the zero-inflation parameter was very negative no matter what I included in the zero-inflation model formula. I don’t get the Hessian matrix error if the zero-inflation is removed. Therefore, would that indicate that it is appropriate to leave out the zero-inflation?
>      >
>      > Thanks again for all your help as this is all new to me and I want to make sure I’m going down the right path and not unnecessarily overcomplicating things.
>      >
>      > Warm Regards,
>      > Alex
>      >
>      > 	[[alternative HTML version deleted]]
>      >
>      > _______________________________________________
>      > R-sig-mixed-models using r-project.org mailing list
>      > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>      >
> 
>      --
>      Dr. Benjamin Bolker
>      Professor, Mathematics & Statistics and Biology, McMaster University
>      Director, School of Computational Science and Engineering
>      Graduate chair, Mathematics & Statistics
> 
>      _______________________________________________
>      R-sig-mixed-models using r-project.org mailing list
>      https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 

-- 
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
Graduate chair, Mathematics & Statistics