[R-sig-ME] How to fix a gamma model with poor fit?
Cátia Ferreira De Oliveira
cm|o500 @end|ng |rom york@@c@uk
Wed Jul 14 06:18:37 CEST 2021
Dear Philip,
Thank you for your comments! I decided to add the information about the
lmer model with logRT as the dependent variable. I decided against using
the lmer with logRT because it also seemed to show a poor fit, so the next
option seemed to be using the glmer + Gamma. In all plots I am plotting the
residuals of the model and not the raw dependent variable as I am aware the
assumption of normality is imposed on the residuals.
The variable RT is in milliseconds, there are just some very small response
times, though still accurate. I am a bit reluctant to remove them because
a) I did not pre-registered any criteria, b) based on the design one of the
groups was overall faster than the other and that may attenuate the group
differences, though maybe winsorization would be an option.
Here are the images for the lmer logRT model.
[7]: https://i.stack.imgur.com/l53YH.png
[8]: https://i.stack.imgur.com/9We75.png
[9]: https://i.stack.imgur.com/PNIqt.png
[10]: https://i.stack.imgur.com/crz9j.png
And the post on crossvalidated with everything:
https://stats.stackexchange.com/questions/534098/glmer-gamma-model-good-fit
Thank you again!
Best wishes,
Catia
On Wed, 14 Jul 2021 at 02:11, Phillip Alday <me using phillipalday.com> wrote:
> A few quick comments:
>
> 1. The QQ-plot diagnostic needs to be against the appropriate
> distribution. For a gamma model, the theoretical quantiles comes from
> the gamma distribution, not the normal distribution.
> 2. You have a log link but your reaction-time data looks to be in
> seconds, not milliseconds. Note that log10(0.1) = -1 and log10(1) = 0,
> but log10(100) = 2 and log10(1000) = 3, so you'll get very different
> answers for seconds vs. milliseconds. The reason why log transforms are
> so nice for reaction times is not "just" the skew, but rather that there
> is an underlying power law driving the effects *on the milliseconds* scale.
>
> Are you using a gamma model because of the Lo and Andrews paper? I've
> indicated my skepticism about that work previously, but these are my
> critical points:
>
> - there's still a transformation going on, it's just in the link function
> - having a nonlinear link complicates interpretation of the coefficients
> - gamma models are much harder to fit numerically (and I believe that
> codepath is less well tested in lme4; it's a known problem in
> MixedModels.jl)
> - the usual tests on residuals, etc. now have to be done against a gamma
> distribution, not a normal, but a lot of diagnostics use the normal by
> default
> - I don't understand the obsession with "satisfying normality
> assumptions" (from their abstract) in a GLMM -- half the point of the
> *generalized* bit is that you can swap in a different distributional
> assumption (the other half is the use of a link function)
>
>
> When thinking about using a model from a particular family/distribution,
> note that the distributional assumption is *on the residuals* so the
> skew in your raw data may or may not be present in the residuals. So
> maybe you don't need a Gamma model at all.
>
> Looking at your model output, it looks like you're using sum contrasts
> -- great! But checkout contr.Sum from the car package for nicer looking
> labels. :)
>
> Phillip
>
>
>
>
> On 13/7/21 6:18 pm, Cátia Ferreira De Oliveira via R-sig-mixed-models
> wrote:
> > Dear all,
> >
> > I am sorry for reposting here after posting on cross validated here
> > <
> https://stats.stackexchange.com/questions/534098/glmer-gamma-model-good-fit
> >
> > but I am still not sure what would be the best way of going about fixing
> > this model. It seems to have poor fit if you look at the plots as they
> have
> > extremes on both sides, which would not fit well with a gamma
> distribution.
> > Despite this, the results are consistent across packages (lme4, nlme...).
> >
> > I have 209062 rows of data and this is response time data.
> > I want to determine whether there are differences between groups (Groups
> -
> > 2 levels) on the learning of a task (Probability - 2 levels) across time
> > (within sessions - Block - 4 levels / across sessions - Session - 2
> > levels). It doesn't have zero response times, but some close to zero.
> >
> > Do you have any suggestions for how one can improve a model like this or
> > whether I should just use another distribution that fits the data a bit
> > better?
> >
> > Thank you!
> >
> > Catia
> >
> >
> > Model:
> >
> > glmer(RT ~ Prob * Bl * Session * Gr + (1 | Participant), data=
> > Data.trimmed, family = Gamma(link =
> > "log"), control=glmerControl(optimizer="bobyqa"))
> > Model summary:
> >
> > Generalized linear mixed model fit by maximum likelihood (Adaptive
> > Gauss-Hermite Quadrature, nAGQ = 0) ['glmerMod']
> > Family: Gamma ( log )
> > Formula: RT ~ Probability * Block * Session * Group + (1 |
> Participant)
> > Data: Data.trimmed
> > Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun =
> > 1e+06))
> >
> > AIC BIC logLik deviance df.resid
> > 2456107 2456538 -1228012 2456023 209020
> >
> > Scaled residuals:
> > Min 1Q Median 3Q Max
> > -4.297 -0.625 -0.158 0.440 35.691
> >
> > Random effects:
> > Groups Name Variance Std.Dev.
> > Participant (Intercept) 0.002203 0.04694
> > Residual 0.053481 0.23126
> > Number of obs: 209062, groups: Participant, 130
> >
> > Fixed effects:
> > Estimate Std. Error t value
> > Pr(>|z|)
> > (Intercept) 6.024e+00 4.182e-03
> 1440.439 <
> > 2e-16 ***
> > Probability1 -2.835e-02 7.041e-04
> -40.265 <
> > 2e-16 ***
> > Block2-1 -2.925e-02 2.077e-03
> -14.084 <
> > 2e-16 ***
> > Block3-2 -3.676e-03 2.131e-03 -1.725
> > 0.084500 .
> > Block4-3 4.085e-03 2.307e-03 1.771
> > 0.076577 .
> > Block5-4 -1.220e-02 2.380e-03 -5.125
> > 2.98e-07 ***
> > Session1 4.795e-02 7.323e-04
> 65.478 <
> > 2e-16 ***
> > Group1 -4.616e-02 4.182e-03
> -11.037 <
> > 2e-16 ***
> > Probability1:Block2-1 -7.228e-03 2.077e-03 -3.480
> > 0.000501 ***
> > Probability1:Block3-2 -5.332e-03 2.131e-03 -2.503
> > 0.012331 *
> > Probability1:Block4-3 -2.076e-02 2.307e-03
> -8.999 <
> > 2e-16 ***
> > Probability1:Block5-4 6.044e-03 2.380e-03 2.539
> > 0.011104 *
> > Probability1:Session1 1.656e-03 7.046e-04 2.351
> > 0.018743 *
> > Block2-1:Session1 -1.972e-02 2.077e-03
> -9.494 <
> > 2e-16 ***
> > Block3-2:Session1 -8.521e-03 2.131e-03 -3.999
> > 6.35e-05 ***
> > Block4-3:Session1 4.380e-05 2.308e-03 0.019
> > 0.984856
> > Block5-4:Session1 -3.768e-03 2.380e-03 -1.583
> > 0.113389
> > Probability1:Group1 1.515e-03 7.041e-04 2.151
> > 0.031478 *
> > Block2-1:Group1 -6.161e-03 2.077e-03 -2.966
> > 0.003015 **
> > Block3-2:Group1 -1.129e-02 2.131e-03 -5.301
> > 1.15e-07 ***
> > Block4-3:Group1 7.095e-03 2.307e-03 3.076
> > 0.002101 **
> > Block5-4:Group1 -4.055e-03 2.380e-03 -1.704
> > 0.088414 .
> > Session1:Group1 3.782e-03 7.323e-04 5.164
> > 2.41e-07 ***
> > Probability1:Block2-1:Session1 5.729e-05 2.077e-03 0.028
> > 0.977997
> > Probability1:Block3-2:Session1 3.543e-03 2.131e-03 1.663
> > 0.096363 .
> > Probability1:Block4-3:Session1 -6.877e-03 2.308e-03 -2.980
> > 0.002886 **
> > Probability1:Block5-4:Session1 4.329e-03 2.380e-03 1.819
> > 0.068952 .
> > Probability1:Block2-1:Group1 -1.238e-03 2.077e-03 -0.596
> > 0.550980
> > Probability1:Block3-2:Group1 1.022e-02 2.131e-03 4.795
> > 1.63e-06 ***
> > Probability1:Block4-3:Group1 -6.532e-03 2.307e-03 -2.831
> > 0.004634 **
> > Probability1:Block5-4:Group1 2.351e-03 2.380e-03 0.988
> > 0.323373
> > Probability1:Session1:Group1 -1.805e-03 7.046e-04 -2.562
> > 0.010412 *
> > Block2-1:Session1:Group1 -2.060e-04 2.077e-03 -0.099
> > 0.920984
> > Block3-2:Session1:Group1 -4.211e-03 2.131e-03 -1.977
> > 0.048094 *
> > Block4-3:Session1:Group1 3.339e-03 2.308e-03 1.447
> > 0.147888
> > Block5-4:Session1:Group1 -3.956e-03 2.380e-03 -1.662
> > 0.096539 .
> > Probability1:Block2-1:Session1:Group1 -1.270e-03 2.077e-03 -0.611
> > 0.540933
> > Probability1:Block3-2:Session1:Group1 1.678e-03 2.131e-03 0.788
> > 0.430929
> > Probability1:Block4-3:Session1:Group1 -4.640e-03 2.308e-03 -2.010
> > 0.044392 *
> > Probability1:Block5-4:Session1:Group1 4.714e-03 2.380e-03 1.980
> > 0.047649 *
> > ---
> > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >
> > Correlation matrix not shown by default, as p = 40 > 12.
> > Use print(x, correlation=TRUE) or
> > vcov(x) if you need it
> >
> > Plots:
> >
> > [1]: https://i.stack.imgur.com/XPdtl.png
> > [2]: https://i.stack.imgur.com/zUNRX.png
> > [3]: https://i.stack.imgur.com/6slYG.png
> > [4]: https://i.stack.imgur.com/LlRwT.png
> > [5]: https://i.stack.imgur.com/TNYCP.png
> > [6]: https://i.stack.imgur.com/45l0P.png
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-mixed-models using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
>
--
Cátia Margarida Ferreira de Oliveira
Psychology PhD Student
Department of Psychology, Room B214
University of York, YO10 5DD
pronouns: she, her
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list