[R-sig-ME] Beta-Binomial Model Question

Sat Sep 4 03:33:11 CEST 2021

   This is an interesting case.

   Beta-binomial models are *not* really appropriate for non-integer 
data; you'd be better off with a straight Beta model (see e.g. Smithson 
and Verkuilen "A better lemon squeezer" 2006).

   However, the Beta distribution has some disadvantages:

  * you have to think about how to incorporate the effects of the 
denominator in the model.  Does having a larger denominator affect the 
precision of the answer in an obvious way?  I can't immediately think of 
a sensible way to use an offset as one would in e.g. a Poisson model 
(where you would include the total counts rather than counts/area, then 
include log(area) in the model as an offset).

  * The Beta doesn't incorporate zero and one counts naturally. glmmTMB 
allows zero-inflation, but not zero-one inflation (brms will do this for 
mixed models; zoib will for non-mixed models).  Do you have a large 
proportion of zeros and ones? (If not, you can 'squeeze' them in a bit 
as in Smithson & Verkuilen) Do 0/1 values represent censoring (below a 
threshold for distinguishing from 0/1, i.e. measurement resolution) or a 
distinct category?

The fact that you got very negative zero-inflation values suggests that 
the model doesn't need them, but that will change when you go from 
Beta-binomial to Beta (at which point *all* zeroes will be 'structural' 
zeroes; if you use an intercept-only Z-I model (~1), this will basically 
just be an estimate of the fraction of zeros in the data).

  There is an experimental diagnose() function that's supposed to help 
you interpret problems with the model fit ...

On 9/3/21 8:43 AM, Alex Waldman wrote:
> Dear All,
> 
> My apologies for the additional question. I am working with data in which I have information on the proportional area taken by a lesion in different locations (region 1, 2, and 3) stratified by different lesion types (lesion type 1, 2, and 3). The denominator to derive the proportion will be different for each of the different locations. The data includes 0 and 1. Therefore, I was thinking of using a beta-binomial model. However, in my formula I included the proportion information as the response:
> 
> glmmTMB::glmmTMB(LesionAreaRatio ~ Location*LesionType + (1 | ID), family=betabinomial, data=total_data_staged, REML=TRUE, control=glmmTMBControl(optimizer=optim, optArgs=list(method="BFGS")))
> 
> I then got the following warnings:
> 
> 
>    1.  In eval(family$initialize) : non-integer #successes in a binomial glm!
>    2.  In fitTMB(TMBStruc) : Model convergence problem; extreme or very small eigenvalues detected. See vignette('troubleshooting')
> 
> I looked in the vignette and this made we wonder if this would be the right model type to use since the proportions are not success/failure data per se but rather represent a normalized area?
> 
> In addition, my data seems to be skewed toward 0s and I thought zero-inflation may be appropriate. When trying varying zero-inflation formulas I consistently received the following error (the eigenvalue error went away):
> 
> In fitTMB(TMBStruc) :
> Model convergence problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
> 
> I looked in the vignette and noticed that the zero-inflation parameter was very negative no matter what I included in the zero-inflation model formula. I don’t get the Hessian matrix error if the zero-inflation is removed. Therefore, would that indicate that it is appropriate to leave out the zero-inflation?
> 
> Thanks again for all your help as this is all new to me and I want to make sure I’m going down the right path and not unnecessarily overcomplicating things.
> 
> Warm Regards,
> Alex
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 

-- 
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
Graduate chair, Mathematics & Statistics