# [R-sig-ME] Beta-Binomial Model Question

Alex Waldman @|ex@w@|dm@n @end|ng |rom @jc@ox@@c@uk
Sat Sep 4 22:28:26 CEST 2021

```Thanks this is helpful!

1. The larger denominator is just inherent to the different regions (the cervical, thoracic, and lumbar spinal cord are different sizes) and won't affect the precision of the answer. It is just used to normalize the lesional area so that I can garner an understanding of the differences between regions.

2. I do have a large proportion of 0s but not 1s (I don’t even think 1s appear in the data). 0s represent a distinct category (ie cases that had no lesions at that particular region or of a particular lesion type).

Given those clarifications:

1. Would it be appropriate to directly use the tabulated proportions (ie lesional area/total area) in a beta model?
2. Would transforming the data and using the beta model without zero inflation or using the zero-inflated beta be preferred? Is there a way to test this by detecting how much zero-inflation is actually present?
3. If I do move forward with the zero-inflated beta model, is it possible to test what terms should be included in the zero-inflated part of the model?

Thanks again for your help!

Warm Regards,
Alex

﻿On 9/4/21, 2:33 AM, "R-sig-mixed-models on behalf of Ben Bolker" <r-sig-mixed-models-bounces using r-project.org on behalf of bbolker using gmail.com> wrote:

This is an interesting case.

Beta-binomial models are *not* really appropriate for non-integer
data; you'd be better off with a straight Beta model (see e.g. Smithson
and Verkuilen "A better lemon squeezer" 2006).

However, the Beta distribution has some disadvantages:

* you have to think about how to incorporate the effects of the
denominator in the model.  Does having a larger denominator affect the
precision of the answer in an obvious way?  I can't immediately think of
a sensible way to use an offset as one would in e.g. a Poisson model
(where you would include the total counts rather than counts/area, then
include log(area) in the model as an offset).

* The Beta doesn't incorporate zero and one counts naturally. glmmTMB
allows zero-inflation, but not zero-one inflation (brms will do this for
mixed models; zoib will for non-mixed models).  Do you have a large
proportion of zeros and ones? (If not, you can 'squeeze' them in a bit
as in Smithson & Verkuilen) Do 0/1 values represent censoring (below a
threshold for distinguishing from 0/1, i.e. measurement resolution) or a
distinct category?

The fact that you got very negative zero-inflation values suggests that
the model doesn't need them, but that will change when you go from
Beta-binomial to Beta (at which point *all* zeroes will be 'structural'
zeroes; if you use an intercept-only Z-I model (~1), this will basically
just be an estimate of the fraction of zeros in the data).

There is an experimental diagnose() function that's supposed to help
you interpret problems with the model fit ...

On 9/3/21 8:43 AM, Alex Waldman wrote:
> Dear All,
>
> My apologies for the additional question. I am working with data in which I have information on the proportional area taken by a lesion in different locations (region 1, 2, and 3) stratified by different lesion types (lesion type 1, 2, and 3). The denominator to derive the proportion will be different for each of the different locations. The data includes 0 and 1. Therefore, I was thinking of using a beta-binomial model. However, in my formula I included the proportion information as the response:
>
> glmmTMB::glmmTMB(LesionAreaRatio ~ Location*LesionType + (1 | ID), family=betabinomial, data=total_data_staged, REML=TRUE, control=glmmTMBControl(optimizer=optim, optArgs=list(method="BFGS")))
>
> I then got the following warnings:
>
>
>    1.  In eval(family\$initialize) : non-integer #successes in a binomial glm!
>    2.  In fitTMB(TMBStruc) : Model convergence problem; extreme or very small eigenvalues detected. See vignette('troubleshooting')
>
> I looked in the vignette and this made we wonder if this would be the right model type to use since the proportions are not success/failure data per se but rather represent a normalized area?
>
> In addition, my data seems to be skewed toward 0s and I thought zero-inflation may be appropriate. When trying varying zero-inflation formulas I consistently received the following error (the eigenvalue error went away):
>
> In fitTMB(TMBStruc) :
> Model convergence problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
>
> I looked in the vignette and noticed that the zero-inflation parameter was very negative no matter what I included in the zero-inflation model formula. I don’t get the Hessian matrix error if the zero-inflation is removed. Therefore, would that indicate that it is appropriate to leave out the zero-inflation?
>
> Thanks again for all your help as this is all new to me and I want to make sure I’m going down the right path and not unnecessarily overcomplicating things.
>
> Warm Regards,
> Alex
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
Graduate chair, Mathematics & Statistics

_______________________________________________
R-sig-mixed-models using r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

```