[R-sig-ME] How to fix a gamma model with poor fit?
Phillip Alday
me @end|ng |rom ph||||p@|d@y@com
Wed Jul 14 09:41:32 CEST 2021
I don't have time to examine things in depth, but the log-RT seems to do
pretty well for a big chunk of the data. There's a pretty hefty left
tail on the residuals that's not great, but it's not the worst I've ever
seen. I would start thinking about why your model is breaking down more
on that side than on the right side.
I suspect you're hitting ceiling effects in your participants - in other
words, that you've run into an asymptote around a certain minimum
reaction time. I think that's an interesting statement in its own right:
the dominant factor isn't (just) your experimental manipulation but
rather biological limits. The ceiling would also be why your model
struggles with the left tail -- the line keeps going in one direction,
but the observed values don't. The one last "easy" transformation to try
would be looking at 1/RT (the inverse transform) instead of log RT.
Beyond that and you have to get into richer models with fancier
distributions or nonlinear components, see e.g.
https://lindeloev.github.io/shiny-rt/ for an overview.
Phillip
On 13/7/21 11:18 pm, Cátia Ferreira De Oliveira wrote:
> Dear Philip,
>
> Thank you for your comments! I decided to add the information about the
> lmer model with logRT as the dependent variable. I decided against using
> the lmer with logRT because it also seemed to show a poor fit, so the
> next option seemed to be using the glmer + Gamma. In all plots I am
> plotting the residuals of the model and not the raw dependent variable
> as I am aware the assumption of normality is imposed on the residuals.
> The variable RT is in milliseconds, there are just some very small
> response times, though still accurate. I am a bit reluctant to remove
> them because a) I did not pre-registered any criteria, b) based on the
> design one of the groups was overall faster than the other and that may
> attenuate the group differences, though maybe winsorization would be an
> option.
>
> Here are the images for the lmer logRT model.
>
> [7]: https://i.stack.imgur.com/l53YH.png
> [8]: https://i.stack.imgur.com/9We75.png
> [9]: https://i.stack.imgur.com/PNIqt.png
> [10]: https://i.stack.imgur.com/crz9j.png
>
> And the post on crossvalidated with
> everything: https://stats.stackexchange.com/questions/534098/glmer-gamma-model-good-fit
>
> Thank you again!
>
> Best wishes,
>
> Catia
>
> On Wed, 14 Jul 2021 at 02:11, Phillip Alday <me using phillipalday.com
> <mailto:me using phillipalday.com>> wrote:
>
> A few quick comments:
>
> 1. The QQ-plot diagnostic needs to be against the appropriate
> distribution. For a gamma model, the theoretical quantiles comes from
> the gamma distribution, not the normal distribution.
> 2. You have a log link but your reaction-time data looks to be in
> seconds, not milliseconds. Note that log10(0.1) = -1 and log10(1) = 0,
> but log10(100) = 2 and log10(1000) = 3, so you'll get very different
> answers for seconds vs. milliseconds. The reason why log transforms are
> so nice for reaction times is not "just" the skew, but rather that there
> is an underlying power law driving the effects *on the milliseconds*
> scale.
>
> Are you using a gamma model because of the Lo and Andrews paper? I've
> indicated my skepticism about that work previously, but these are my
> critical points:
>
> - there's still a transformation going on, it's just in the link
> function
> - having a nonlinear link complicates interpretation of the coefficients
> - gamma models are much harder to fit numerically (and I believe that
> codepath is less well tested in lme4; it's a known problem in
> MixedModels.jl)
> - the usual tests on residuals, etc. now have to be done against a gamma
> distribution, not a normal, but a lot of diagnostics use the normal by
> default
> - I don't understand the obsession with "satisfying normality
> assumptions" (from their abstract) in a GLMM -- half the point of the
> *generalized* bit is that you can swap in a different distributional
> assumption (the other half is the use of a link function)
>
>
> When thinking about using a model from a particular family/distribution,
> note that the distributional assumption is *on the residuals* so the
> skew in your raw data may or may not be present in the residuals. So
> maybe you don't need a Gamma model at all.
>
> Looking at your model output, it looks like you're using sum contrasts
> -- great! But checkout contr.Sum from the car package for nicer looking
> labels. :)
>
> Phillip
>
>
>
>
> On 13/7/21 6:18 pm, Cátia Ferreira De Oliveira via
> R-sig-mixed-models wrote:
> > Dear all,
> >
> > I am sorry for reposting here after posting on cross validated here
> >
> <https://stats.stackexchange.com/questions/534098/glmer-gamma-model-good-fit>
> > but I am still not sure what would be the best way of going about
> fixing
> > this model. It seems to have poor fit if you look at the plots as
> they have
> > extremes on both sides, which would not fit well with a gamma
> distribution.
> > Despite this, the results are consistent across packages (lme4,
> nlme...).
> >
> > I have 209062 rows of data and this is response time data.
> > I want to determine whether there are differences between groups
> (Groups -
> > 2 levels) on the learning of a task (Probability - 2 levels)
> across time
> > (within sessions - Block - 4 levels / across sessions - Session - 2
> > levels). It doesn't have zero response times, but some close to zero.
> >
> > Do you have any suggestions for how one can improve a model like
> this or
> > whether I should just use another distribution that fits the data
> a bit
> > better?
> >
> > Thank you!
> >
> > Catia
> >
> >
> > Model:
> >
> > glmer(RT ~ Prob * Bl * Session * Gr + (1 | Participant), data=
> > Data.trimmed, family = Gamma(link =
> > "log"), control=glmerControl(optimizer="bobyqa"))
> > Model summary:
> >
> > Generalized linear mixed model fit by maximum likelihood (Adaptive
> > Gauss-Hermite Quadrature, nAGQ = 0) ['glmerMod']
> > Family: Gamma ( log )
> > Formula: RT ~ Probability * Block * Session * Group + (1 |
> Participant)
> > Data: Data.trimmed
> > Control: glmerControl(optimizer = "bobyqa", optCtrl =
> list(maxfun =
> > 1e+06))
> >
> > AIC BIC logLik deviance df.resid
> > 2456107 2456538 -1228012 2456023 209020
> >
> > Scaled residuals:
> > Min 1Q Median 3Q Max
> > -4.297 -0.625 -0.158 0.440 35.691
> >
> > Random effects:
> > Groups Name Variance Std.Dev.
> > Participant (Intercept) 0.002203 0.04694
> > Residual 0.053481 0.23126
> > Number of obs: 209062, groups: Participant, 130
> >
> > Fixed effects:
> > Estimate Std. Error t
> value
> > Pr(>|z|)
> > (Intercept) 6.024e+00 4.182e-03
> 1440.439 <
> > 2e-16 ***
> > Probability1 -2.835e-02 7.041e-04
> -40.265 <
> > 2e-16 ***
> > Block2-1 -2.925e-02 2.077e-03
> -14.084 <
> > 2e-16 ***
> > Block3-2 -3.676e-03 2.131e-03
> -1.725
> > 0.084500 .
> > Block4-3 4.085e-03 2.307e-03
> 1.771
> > 0.076577 .
> > Block5-4 -1.220e-02 2.380e-03
> -5.125
> > 2.98e-07 ***
> > Session1 4.795e-02 7.323e-04
> 65.478 <
> > 2e-16 ***
> > Group1 -4.616e-02 4.182e-03
> -11.037 <
> > 2e-16 ***
> > Probability1:Block2-1 -7.228e-03 2.077e-03
> -3.480
> > 0.000501 ***
> > Probability1:Block3-2 -5.332e-03 2.131e-03
> -2.503
> > 0.012331 *
> > Probability1:Block4-3 -2.076e-02 2.307e-03
> -8.999 <
> > 2e-16 ***
> > Probability1:Block5-4 6.044e-03 2.380e-03
> 2.539
> > 0.011104 *
> > Probability1:Session1 1.656e-03 7.046e-04
> 2.351
> > 0.018743 *
> > Block2-1:Session1 -1.972e-02 2.077e-03
> -9.494 <
> > 2e-16 ***
> > Block3-2:Session1 -8.521e-03 2.131e-03
> -3.999
> > 6.35e-05 ***
> > Block4-3:Session1 4.380e-05 2.308e-03
> 0.019
> > 0.984856
> > Block5-4:Session1 -3.768e-03 2.380e-03
> -1.583
> > 0.113389
> > Probability1:Group1 1.515e-03 7.041e-04
> 2.151
> > 0.031478 *
> > Block2-1:Group1 -6.161e-03 2.077e-03
> -2.966
> > 0.003015 **
> > Block3-2:Group1 -1.129e-02 2.131e-03
> -5.301
> > 1.15e-07 ***
> > Block4-3:Group1 7.095e-03 2.307e-03
> 3.076
> > 0.002101 **
> > Block5-4:Group1 -4.055e-03 2.380e-03
> -1.704
> > 0.088414 .
> > Session1:Group1 3.782e-03 7.323e-04
> 5.164
> > 2.41e-07 ***
> > Probability1:Block2-1:Session1 5.729e-05 2.077e-03
> 0.028
> > 0.977997
> > Probability1:Block3-2:Session1 3.543e-03 2.131e-03
> 1.663
> > 0.096363 .
> > Probability1:Block4-3:Session1 -6.877e-03 2.308e-03
> -2.980
> > 0.002886 **
> > Probability1:Block5-4:Session1 4.329e-03 2.380e-03
> 1.819
> > 0.068952 .
> > Probability1:Block2-1:Group1 -1.238e-03 2.077e-03
> -0.596
> > 0.550980
> > Probability1:Block3-2:Group1 1.022e-02 2.131e-03
> 4.795
> > 1.63e-06 ***
> > Probability1:Block4-3:Group1 -6.532e-03 2.307e-03
> -2.831
> > 0.004634 **
> > Probability1:Block5-4:Group1 2.351e-03 2.380e-03
> 0.988
> > 0.323373
> > Probability1:Session1:Group1 -1.805e-03 7.046e-04
> -2.562
> > 0.010412 *
> > Block2-1:Session1:Group1 -2.060e-04 2.077e-03
> -0.099
> > 0.920984
> > Block3-2:Session1:Group1 -4.211e-03 2.131e-03
> -1.977
> > 0.048094 *
> > Block4-3:Session1:Group1 3.339e-03 2.308e-03
> 1.447
> > 0.147888
> > Block5-4:Session1:Group1 -3.956e-03 2.380e-03
> -1.662
> > 0.096539 .
> > Probability1:Block2-1:Session1:Group1 -1.270e-03 2.077e-03
> -0.611
> > 0.540933
> > Probability1:Block3-2:Session1:Group1 1.678e-03 2.131e-03
> 0.788
> > 0.430929
> > Probability1:Block4-3:Session1:Group1 -4.640e-03 2.308e-03
> -2.010
> > 0.044392 *
> > Probability1:Block5-4:Session1:Group1 4.714e-03 2.380e-03
> 1.980
> > 0.047649 *
> > ---
> > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >
> > Correlation matrix not shown by default, as p = 40 > 12.
> > Use print(x, correlation=TRUE) or
> > vcov(x) if you need it
> >
> > Plots:
> >
> > [1]: https://i.stack.imgur.com/XPdtl.png
> > [2]: https://i.stack.imgur.com/zUNRX.png
> > [3]: https://i.stack.imgur.com/6slYG.png
> > [4]: https://i.stack.imgur.com/LlRwT.png
> > [5]: https://i.stack.imgur.com/TNYCP.png
> > [6]: https://i.stack.imgur.com/45l0P.png
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-mixed-models using r-project.org
> <mailto:R-sig-mixed-models using r-project.org> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
>
>
>
> --
> Cátia Margarida Ferreira de Oliveira
> Psychology PhD Student
> Department of Psychology, Room B214
> University of York, YO10 5DD
> pronouns: she, her
More information about the R-sig-mixed-models
mailing list