[R-sig-ME] Comparing Gaussian and bêta regression

Wed Sep 19 17:02:17 CEST 2018

Thank you very much for the hints.

The « not very satisfactory » is from a "theoretical" point of view:
I'm not very comfortable with modeling with a Gaussian a value
constrained between 0 and 10, with the extremes obtained not so
rarely.  From a practical point of view, it does not seem to produce
unexpected results.  Of course, there are some effects that are
borderline significant, that also makes the question uprise: what is
the part of true signal and basically inadequate model in these
effects? Still finding them with a more sounded model would make them a
little bit more "trustable"...

For the ordinal outcome: I have wrongly selected my example value, it
induced in error, sorry ; the step is 0.1 and not 0.5 ; in practice,
46 different values were observed. Integer values and, to a less
extent, half-integer values are clearly over-represented, I guess
because of inconscient rounding during scoring.  I don't know how to
handle this in a model, however, but that's another problem, and may
be there is no need for that.  But, for the ordinal aspect, I fear
that would make too much parameters in the model... 

Just thinking... Would it be imaginable to make inferences on the
beta-distribution model, since it seems to much better describe the
data, but use the linear model on the raw scale just to have
point-estimates of the changes in an easiest-to-interpret way?
[despite it is problematic close to the boundaries...]

Is the Gelmann & Hill book you're thinking about this one: ?

Data Analysis Using Regression and Multilevel/Hierarchical Models
 Cambridge University Press
ISBN-10: 052168689X

On Wed, Sep 19, 2018 at 09:49:51AM -0400, Ben Bolker wrote:
« 
« 
« On 2018-09-19 03:30 AM, Emmanuel Curis wrote:
« > Hello,
« > 
« > I'm doing my first try on bêta regression, with mixed effects model,
« > and was wondering if my reasonning is correct...
« > 
« > The context is a clinical study where the outcome is a score variable,
« > with continuous values between 0 and 10 (both included) and, in
« > practice, values with only one decimal digit (eg. 1.5) There is
« > about 400 patients. Random effect is the clinician who does the
« > examination and afterthat collects the score that evaluates its
« > intervention.
« > 
« > As a quick-and-dirty analysis, I did a linear mixed effect model on
« > the raw data, with lmer. Residuals and random effects are not so bad,
« > and results consistent & easy to interpret, but assuming a Gaussian
« > distribution is not very satisfactory.
« 
« Can you expand on why "not very satisfactory"?  Do you get unrealistic
« predictions etc.?
« 
«   This sounds like it could also be treated as an ordinal response (with
« 21 values {0, 0.5, 1, ... 9.5, 10}).
« > 
« > Hence, I tried a bêta regression on the data after the transformation
« > (y/10 * (n-1) + 0.5) / n, and used glmmTMB for that. And of course I
« > wondered if the fit was better.
« > 
« > 1) Is it right that ln-likelihood of the model on the raw data
« >    (Gaussian) and on the transformed data (bêta) cannot be compared,
« >    because they involve probability densities and not probabilities,
« >    hence depend on the data scale ?
« 
«   You can compare log-likelihoods (actually technically they're
« log-likelihood *densities*, which is where the problem comes from) if
« you account for the scaling.  In this case since you're doing a linear
« transformation the scaling should be pretty easy.
« > 
« > 2) Is it right that the lmer model done on the raw data and the same
« >    one done on the transformed data are conceptually the same, since
« >    the transformation is linear — so that the ln-likelihood it gives
« >    is « the same » expressed in the two different scales? (of course,
« >    coefficients and so on will be different because of the scale
« >    change)
« 
«    Should be. (You could do a simple test of this ...)
« > 
« > 3) And so, is it correct to compare the ln-likelihood (using logLik)
« >    or the AIC given by glmmTMB with the bêta model and by lmer on
« >    transformed data to compare the two models (raw data Gaussian vs
« >    bêta)?
« 
«   I would think so.
« > 
« >    If so, the bêta model seems better than the Gaussian one. But now
« >    comes the interpretation problem, other than « are coefficients
« >    significantly different from 0? ».
« > 
« > 4) Since the default link is the logit for the mean, interpretation is
« >    not quite clear for me.  For the Gaussian model on raw data,
« >    interpretation is clear, for instance « men score 1 point lower
« >    than women in average ».  But how can the coefficients of the
« >    bêta-model be back-converted in a similar fashion ?
« 
«    You probably need to go read stuff about interpretation of
« logit/log-odds  parameters: Gelman and Hill's book is good.
« 
« Quick rules of thumb:
« 
« * for β∆x small, as for log (proportional)
« * for intermediate values, linear change in probability with
« slope ≈ β/4
« * for large values, as for log ( 1 − x )
« > 
« >    Would it be easier to use a log link and expression changes in the
« >    scale as percent changes on the mean?
« 
«   This will work fine for low score values, but will run into trouble at
« the upper end of the score range.
« 
« > 
« > Thanks in advance,
« >
« 
« _______________________________________________
« R-sig-mixed-models using r-project.org mailing list
« https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 
                                Emmanuel CURIS
                                emmanuel.curis using parisdescartes.fr

Page WWW: http://emmanuel.curis.online.fr/index.html