[R-sig-ME] How to fix a gamma model with poor fit?

Wed Jul 14 09:41:32 CEST 2021

I don't have time to examine things in depth, but the log-RT seems to do
pretty well for a big chunk of the data. There's a pretty hefty left
tail on the residuals that's not great, but it's not the worst I've ever
seen. I would start thinking about why your model is breaking down more
on that side than on the right side.

I suspect you're hitting ceiling effects in your participants - in other
words, that you've run into an asymptote around a certain minimum
reaction time. I think that's an interesting statement in its own right:
the dominant factor isn't (just) your experimental manipulation but
rather biological limits. The ceiling would also be why your model
struggles with the left tail -- the line keeps going in one direction,
but the observed values don't. The one last "easy" transformation to try
would be looking at 1/RT (the inverse transform) instead of log RT.
Beyond that and you have to get into richer models with fancier
distributions or nonlinear components, see e.g.
https://lindeloev.github.io/shiny-rt/ for an overview.

Phillip

On 13/7/21 11:18 pm, Cátia Ferreira De Oliveira wrote:
> Dear Philip,
> 
> Thank you for your comments! I decided to add the information about the
> lmer model with logRT as the dependent variable. I decided against using
> the lmer with logRT because it also seemed to show a poor fit, so the
> next option seemed to be using the glmer + Gamma. In all plots I am
> plotting the residuals of the model and not the raw dependent variable
> as I am aware the assumption of normality is imposed on the residuals.
> The variable RT is in milliseconds, there are just some very small
> response times, though still accurate. I am a bit reluctant to remove
> them because a) I did not pre-registered any criteria, b) based on the
> design one of the groups was overall faster than the other and that may
> attenuate the group differences, though maybe winsorization would be an
> option.
> 
> Here are the images for the lmer logRT model.
> 
>   [7]: https://i.stack.imgur.com/l53YH.png
>   [8]: https://i.stack.imgur.com/9We75.png
>   [9]: https://i.stack.imgur.com/PNIqt.png
>   [10]: https://i.stack.imgur.com/crz9j.png
> 
> And the post on crossvalidated with
> everything: https://stats.stackexchange.com/questions/534098/glmer-gamma-model-good-fit
> 
> Thank you again!
> 
> Best wishes,
> 
> Catia
> 
> On Wed, 14 Jul 2021 at 02:11, Phillip Alday <me using phillipalday.com
> <mailto:me using phillipalday.com>> wrote:
> 
>     A few quick comments:
> 
>     1. The QQ-plot diagnostic needs to be against the appropriate
>     distribution. For a gamma model, the theoretical quantiles comes from
>     the gamma distribution, not the normal distribution.
>     2. You have a log link but your reaction-time data looks to be in
>     seconds, not milliseconds. Note that log10(0.1) = -1 and log10(1) = 0,
>     but log10(100) = 2 and log10(1000) = 3, so you'll get very different
>     answers for seconds vs. milliseconds. The reason why log transforms are
>     so nice for reaction times is not "just" the skew, but rather that there
>     is an underlying power law driving the effects *on the milliseconds*
>     scale.
> 
>     Are you using a gamma model because of the Lo and Andrews paper? I've
>     indicated my skepticism about that work previously, but these are my
>     critical points:
> 
>     - there's still a transformation going on, it's just in the link
>     function
>     - having a nonlinear link complicates interpretation of the coefficients
>     - gamma models are much harder to fit numerically (and I believe that
>     codepath is less well tested in lme4; it's a known problem in
>     MixedModels.jl)
>     - the usual tests on residuals, etc. now have to be done against a gamma
>     distribution, not a normal, but a lot of diagnostics use the normal by
>     default
>     - I don't understand the obsession with "satisfying normality
>     assumptions" (from their abstract) in a GLMM -- half the point of the
>     *generalized* bit is that you can swap in a different distributional
>     assumption (the other half is the use of a link function)
> 
> 
>     When thinking about using a model from a particular family/distribution,
>     note that the distributional assumption is *on the residuals* so the
>     skew in your raw data may or may not be present in the residuals. So
>     maybe you don't need a Gamma model at all.
> 
>     Looking at your model output, it looks like you're using sum contrasts
>     -- great! But checkout contr.Sum from the car package for nicer looking
>     labels. :)
> 
>     Phillip
> 
> 
> 
> 
>     On 13/7/21 6:18 pm, Cátia Ferreira De Oliveira via
>     R-sig-mixed-models wrote:
>     > Dear all,
>     >
>     > I am sorry for reposting here after posting on cross validated here
>     >
>     <https://stats.stackexchange.com/questions/534098/glmer-gamma-model-good-fit>
>     > but I am still not sure what would be the best way of going about
>     fixing
>     > this model. It seems to have poor fit if you look at the plots as
>     they have
>     > extremes on both sides, which would not fit well with a gamma
>     distribution.
>     > Despite this, the results are consistent across packages (lme4,
>     nlme...).
>     >
>     > I have 209062 rows of data and this is response time data.
>     > I want to determine whether there are differences between groups
>     (Groups -
>     > 2 levels) on the learning of a task (Probability - 2 levels)
>     across time
>     > (within sessions - Block - 4 levels / across sessions - Session - 2
>     > levels). It doesn't have zero response times, but some close to zero.
>     >
>     > Do you have any suggestions for how one can improve a model like
>     this or
>     > whether I should just use another distribution that fits the data
>     a bit
>     > better?
>     >
>     > Thank you!
>     >
>     > Catia
>     >
>     >
>     > Model:
>     >
>     >     glmer(RT ~ Prob * Bl * Session * Gr + (1  | Participant), data=
>     > Data.trimmed, family = Gamma(link =
>     > "log"), control=glmerControl(optimizer="bobyqa"))
>     > Model summary:
>     >
>     >     Generalized linear mixed model fit by maximum likelihood (Adaptive
>     > Gauss-Hermite Quadrature, nAGQ = 0) ['glmerMod']
>     >      Family: Gamma  ( log )
>     >     Formula: RT ~ Probability * Block * Session * Group + (1 |
>     Participant)
>     >        Data: Data.trimmed
>     >     Control: glmerControl(optimizer = "bobyqa", optCtrl =
>     list(maxfun =
>     > 1e+06))
>     >
>     >          AIC      BIC   logLik deviance df.resid
>     >      2456107  2456538 -1228012  2456023   209020
>     >
>     >     Scaled residuals:
>     >        Min     1Q Median     3Q    Max
>     >     -4.297 -0.625 -0.158  0.440 35.691
>     >
>     >     Random effects:
>     >      Groups      Name        Variance Std.Dev.
>     >      Participant (Intercept) 0.002203 0.04694
>     >      Residual                0.053481 0.23126
>     >     Number of obs: 209062, groups:  Participant, 130
>     >
>     >     Fixed effects:
>     >                                             Estimate Std. Error  t
>     value
>     > Pr(>|z|)
>     >     (Intercept)                            6.024e+00  4.182e-03
>     1440.439  <
>     > 2e-16 ***
>     >     Probability1                          -2.835e-02  7.041e-04 
>     -40.265  <
>     > 2e-16 ***
>     >     Block2-1                              -2.925e-02  2.077e-03 
>     -14.084  <
>     > 2e-16 ***
>     >     Block3-2                              -3.676e-03  2.131e-03 
>      -1.725
>     > 0.084500 .
>     >     Block4-3                               4.085e-03  2.307e-03   
>     1.771
>     > 0.076577 .
>     >     Block5-4                              -1.220e-02  2.380e-03 
>      -5.125
>     > 2.98e-07 ***
>     >     Session1                               4.795e-02  7.323e-04 
>      65.478  <
>     > 2e-16 ***
>     >     Group1                                -4.616e-02  4.182e-03 
>     -11.037  <
>     > 2e-16 ***
>     >     Probability1:Block2-1                 -7.228e-03  2.077e-03 
>      -3.480
>     > 0.000501 ***
>     >     Probability1:Block3-2                 -5.332e-03  2.131e-03 
>      -2.503
>     > 0.012331 *
>     >     Probability1:Block4-3                 -2.076e-02  2.307e-03 
>      -8.999  <
>     > 2e-16 ***
>     >     Probability1:Block5-4                  6.044e-03  2.380e-03   
>     2.539
>     > 0.011104 *
>     >     Probability1:Session1                  1.656e-03  7.046e-04   
>     2.351
>     > 0.018743 *
>     >     Block2-1:Session1                     -1.972e-02  2.077e-03 
>      -9.494  <
>     > 2e-16 ***
>     >     Block3-2:Session1                     -8.521e-03  2.131e-03 
>      -3.999
>     > 6.35e-05 ***
>     >     Block4-3:Session1                      4.380e-05  2.308e-03   
>     0.019
>     > 0.984856
>     >     Block5-4:Session1                     -3.768e-03  2.380e-03 
>      -1.583
>     > 0.113389
>     >     Probability1:Group1                    1.515e-03  7.041e-04   
>     2.151
>     > 0.031478 *
>     >     Block2-1:Group1                       -6.161e-03  2.077e-03 
>      -2.966
>     > 0.003015 **
>     >     Block3-2:Group1                       -1.129e-02  2.131e-03 
>      -5.301
>     > 1.15e-07 ***
>     >     Block4-3:Group1                        7.095e-03  2.307e-03   
>     3.076
>     > 0.002101 **
>     >     Block5-4:Group1                       -4.055e-03  2.380e-03 
>      -1.704
>     > 0.088414 .
>     >     Session1:Group1                        3.782e-03  7.323e-04   
>     5.164
>     > 2.41e-07 ***
>     >     Probability1:Block2-1:Session1         5.729e-05  2.077e-03   
>     0.028
>     > 0.977997
>     >     Probability1:Block3-2:Session1         3.543e-03  2.131e-03   
>     1.663
>     > 0.096363 .
>     >     Probability1:Block4-3:Session1        -6.877e-03  2.308e-03 
>      -2.980
>     > 0.002886 **
>     >     Probability1:Block5-4:Session1         4.329e-03  2.380e-03   
>     1.819
>     > 0.068952 .
>     >     Probability1:Block2-1:Group1          -1.238e-03  2.077e-03 
>      -0.596
>     > 0.550980
>     >     Probability1:Block3-2:Group1           1.022e-02  2.131e-03   
>     4.795
>     > 1.63e-06 ***
>     >     Probability1:Block4-3:Group1          -6.532e-03  2.307e-03 
>      -2.831
>     > 0.004634 **
>     >     Probability1:Block5-4:Group1           2.351e-03  2.380e-03   
>     0.988
>     > 0.323373
>     >     Probability1:Session1:Group1          -1.805e-03  7.046e-04 
>      -2.562
>     > 0.010412 *
>     >     Block2-1:Session1:Group1              -2.060e-04  2.077e-03 
>      -0.099
>     > 0.920984
>     >     Block3-2:Session1:Group1              -4.211e-03  2.131e-03 
>      -1.977
>     > 0.048094 *
>     >     Block4-3:Session1:Group1               3.339e-03  2.308e-03   
>     1.447
>     > 0.147888
>     >     Block5-4:Session1:Group1              -3.956e-03  2.380e-03 
>      -1.662
>     > 0.096539 .
>     >     Probability1:Block2-1:Session1:Group1 -1.270e-03  2.077e-03 
>      -0.611
>     > 0.540933
>     >     Probability1:Block3-2:Session1:Group1  1.678e-03  2.131e-03   
>     0.788
>     > 0.430929
>     >     Probability1:Block4-3:Session1:Group1 -4.640e-03  2.308e-03 
>      -2.010
>     > 0.044392 *
>     >     Probability1:Block5-4:Session1:Group1  4.714e-03  2.380e-03   
>     1.980
>     > 0.047649 *
>     >     ---
>     >     Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>     >
>     >     Correlation matrix not shown by default, as p = 40 > 12.
>     >     Use print(x, correlation=TRUE)  or
>     >         vcov(x)        if you need it
>     >
>     > Plots:
>     >
>     >   [1]: https://i.stack.imgur.com/XPdtl.png
>     >   [2]: https://i.stack.imgur.com/zUNRX.png
>     >   [3]: https://i.stack.imgur.com/6slYG.png
>     >   [4]: https://i.stack.imgur.com/LlRwT.png
>     >   [5]: https://i.stack.imgur.com/TNYCP.png
>     >   [6]: https://i.stack.imgur.com/45l0P.png
>     >
>     >       [[alternative HTML version deleted]]
>     >
>     > _______________________________________________
>     > R-sig-mixed-models using r-project.org
>     <mailto:R-sig-mixed-models using r-project.org> mailing list
>     > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>     >
> 
> 
> 
> -- 
> Cátia Margarida Ferreira de Oliveira
> Psychology PhD Student
> Department of Psychology, Room B214
> University of York, YO10 5DD
> pronouns: she, her