[R-sig-ME] [EXT] Re: Too high condition R-square value - beta family

Ben Bolker bbo|ker @end|ng |rom gm@||@com
Tue Nov 29 23:13:43 CET 2022


   This gets tricky (and possibly farther into the weeds than the OP is 
interested in).

   tl;dr provided everyone is using the right components of the model 
output in the right places, these two different definitions don't 
necessarily represent a problem.

    The $variance component of 'family' objects in R (as produced by 
functions such as gaussian(), Gamma(), etc.) gives only the component of 
the variance that depends on the mean: for example, 
gaussian()$variance() returns a vector of all 1s.  (The reason for this 
goes back to the classical definitions of generalized linear models, 
where the dispersion parameter [the scaling factor of the variance that 
is *independent* of the mean] is a nuisance parameter that can be 
ignored for many purposes.)  If you want the conditional variance of a 
prediction, you typically need to multiply the $variance() output by a 
dispersion value (you can get this by running sigma() on the model, 
although for glmmTMB families you need to check `?sigma.glmmTMB`: in the 
case of the Beta family I think you need 
$variance(predicted_mu)/(1+sigma(fitted_model)).


More discussion:

* https://github.com/glmmTMB/glmmTMB/issues/294

* https://github.com/glmmTMB/glmmTMB/issues/169#issuecomment-676086686 
(you're asking the same question here!)


On 2022-11-29 4:50 PM, Daniel Lüdecke wrote:
> It can be that the calculation of the random effects variances is not accurate. The code in the *insight* package (which is used by performance::r2()) has "mu * (1 - mu) / (1 + phi)" to calculate the distributional variance; glmmTMB::beta_family()$variance, however, returns "mu * (1 - mu)". The docs in ?glmmTMB::beta_family, again, say: "Beta distribution: parameterization of Ferrari and Cribari-Neto (2004) and the betareg package (Cribari-Neto and Zeileis 2010); V=μ(1−μ)/(ϕ+1)" (which is what is used in *insight*).
>
> I'm not sure that this is the issue, but it might be. Would be good to know which of the two formulas is the correct / more accurate one.
>
> -----Ursprüngliche Nachricht-----
> Von: R-sig-mixed-models <r-sig-mixed-models-bounces using r-project.org> Im Auftrag von Ben Bolker
> Gesendet: Dienstag, 29. November 2022 22:01
> An: r-sig-mixed-models using r-project.org
> Betreff: [EXT] Re: [R-sig-ME] Too high condition R-square value - beta family
>
>      Thanks.  Can you please post the results of summary() applied to
> your fitted model?  That could give us some more clues ...
>
> On 2022-11-29 3:41 PM, camille.montalcini using unibe.ch wrote:
>> Dear list members,
>>
>> I am using glmmTMB to fit a beta family (with log link) to some proportion data (varying from 0-1, which I rescaled from 0.01 to 0.99).  I have two continuous rescaled predictors (including a time variable) and a binary treatment predictor. My only goal is to assess if there is any treatment effect (i.e. not to make predictions, so maybe overfitting is less of an issue here). As random effect I have my individuals ID (~160 individuals, and around 28 observations per individuals). The model fits reasonably well, but the main issue is that I get a very high conditional R-square: 0.986 (from: performance::r2(fit)) (marginal: 0.034) with the warning: "mu of 0.6 is too close to zero, estimate of random effect variances may be unreliable".
>>
>> I tried many thing, including checking if the model is singular (performance::check_singularity())) and it appeared not to be, removing the fixed effects does not change anything either, shuffling the individualsID lead too conditional R-squared around 0.25, removing hens with random intercept mode in the extreme did not change anything either (though model fits generally better). Visualising the data, reveals the individuals to be indeed quite consistent, but likely not up to the level that we could explain 98.7% of the variance, so I am quite confident the model is not reliable. Its the first time I am using beta regression and I feel that I am missing an important point here, any insight would be greatly appreciated!
>>
>> Best,
>> Camille
>>
>>
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> --
>
> _____________________________________________________________________
>
> Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; Gerichtsstand: Hamburg | www.uke.de
> Vorstandsmitglieder: Prof. Dr. Burkhard Göke (Vorsitzender), Joachim Prölß, Prof. Dr. Blanche Schwappach-Pignataro, Marya Verdel
> _____________________________________________________________________
>
> SAVE PAPER - THINK BEFORE PRINTING



More information about the R-sig-mixed-models mailing list