[R-sig-ME] Comparing Gaussian and bêta regression
Emmanuel Curis
emm@nuel@curi@ @ending from p@ri@de@c@rte@@fr
Wed Sep 19 17:02:17 CEST 2018
Thank you very much for the hints.
The « not very satisfactory » is from a "theoretical" point of view:
I'm not very comfortable with modeling with a Gaussian a value
constrained between 0 and 10, with the extremes obtained not so
rarely. From a practical point of view, it does not seem to produce
unexpected results. Of course, there are some effects that are
borderline significant, that also makes the question uprise: what is
the part of true signal and basically inadequate model in these
effects? Still finding them with a more sounded model would make them a
little bit more "trustable"...
For the ordinal outcome: I have wrongly selected my example value, it
induced in error, sorry ; the step is 0.1 and not 0.5 ; in practice,
46 different values were observed. Integer values and, to a less
extent, half-integer values are clearly over-represented, I guess
because of inconscient rounding during scoring. I don't know how to
handle this in a model, however, but that's another problem, and may
be there is no need for that. But, for the ordinal aspect, I fear
that would make too much parameters in the model...
Just thinking... Would it be imaginable to make inferences on the
beta-distribution model, since it seems to much better describe the
data, but use the linear model on the raw scale just to have
point-estimates of the changes in an easiest-to-interpret way?
[despite it is problematic close to the boundaries...]
Is the Gelmann & Hill book you're thinking about this one: ?
Data Analysis Using Regression and Multilevel/Hierarchical Models
Cambridge University Press
ISBN-10: 052168689X
On Wed, Sep 19, 2018 at 09:49:51AM -0400, Ben Bolker wrote:
«
«
« On 2018-09-19 03:30 AM, Emmanuel Curis wrote:
« > Hello,
« >
« > I'm doing my first try on bêta regression, with mixed effects model,
« > and was wondering if my reasonning is correct...
« >
« > The context is a clinical study where the outcome is a score variable,
« > with continuous values between 0 and 10 (both included) and, in
« > practice, values with only one decimal digit (eg. 1.5) There is
« > about 400 patients. Random effect is the clinician who does the
« > examination and afterthat collects the score that evaluates its
« > intervention.
« >
« > As a quick-and-dirty analysis, I did a linear mixed effect model on
« > the raw data, with lmer. Residuals and random effects are not so bad,
« > and results consistent & easy to interpret, but assuming a Gaussian
« > distribution is not very satisfactory.
«
« Can you expand on why "not very satisfactory"? Do you get unrealistic
« predictions etc.?
«
« This sounds like it could also be treated as an ordinal response (with
« 21 values {0, 0.5, 1, ... 9.5, 10}).
« >
« > Hence, I tried a bêta regression on the data after the transformation
« > (y/10 * (n-1) + 0.5) / n, and used glmmTMB for that. And of course I
« > wondered if the fit was better.
« >
« > 1) Is it right that ln-likelihood of the model on the raw data
« > (Gaussian) and on the transformed data (bêta) cannot be compared,
« > because they involve probability densities and not probabilities,
« > hence depend on the data scale ?
«
« You can compare log-likelihoods (actually technically they're
« log-likelihood *densities*, which is where the problem comes from) if
« you account for the scaling. In this case since you're doing a linear
« transformation the scaling should be pretty easy.
« >
« > 2) Is it right that the lmer model done on the raw data and the same
« > one done on the transformed data are conceptually the same, since
« > the transformation is linear — so that the ln-likelihood it gives
« > is « the same » expressed in the two different scales? (of course,
« > coefficients and so on will be different because of the scale
« > change)
«
« Should be. (You could do a simple test of this ...)
« >
« > 3) And so, is it correct to compare the ln-likelihood (using logLik)
« > or the AIC given by glmmTMB with the bêta model and by lmer on
« > transformed data to compare the two models (raw data Gaussian vs
« > bêta)?
«
« I would think so.
« >
« > If so, the bêta model seems better than the Gaussian one. But now
« > comes the interpretation problem, other than « are coefficients
« > significantly different from 0? ».
« >
« > 4) Since the default link is the logit for the mean, interpretation is
« > not quite clear for me. For the Gaussian model on raw data,
« > interpretation is clear, for instance « men score 1 point lower
« > than women in average ». But how can the coefficients of the
« > bêta-model be back-converted in a similar fashion ?
«
« You probably need to go read stuff about interpretation of
« logit/log-odds parameters: Gelman and Hill's book is good.
«
« Quick rules of thumb:
«
« * for β∆x small, as for log (proportional)
« * for intermediate values, linear change in probability with
« slope ≈ β/4
« * for large values, as for log ( 1 − x )
« >
« > Would it be easier to use a log link and expression changes in the
« > scale as percent changes on the mean?
«
« This will work fine for low score values, but will run into trouble at
« the upper end of the score range.
«
« >
« > Thanks in advance,
« >
«
« _______________________________________________
« R-sig-mixed-models using r-project.org mailing list
« https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
--
Emmanuel CURIS
emmanuel.curis using parisdescartes.fr
Page WWW: http://emmanuel.curis.online.fr/index.html
More information about the R-sig-mixed-models
mailing list