[R-sig-ME] R-sig-mixed-models Digest, Vol 134, Issue 42

Kornbrot, Diana d.e.kornbrot at herts.ac.uk
Fri Mar 30 10:28:08 CEST 2018


Thanks for help

Have now successfully installed emmeans
Now have to figure out why SPSS and R do not agree, and why R thinks df =Inf
________
R       glmer                                                   MIXED
Rab     Between emmean  SE      df      asymp.LCL       asymp.UCL       Rab     Between Mean    SE      lcl     ucl
1       1       -.316   .170    Inf     -.650   .018    1       1       -.267   .148    -.562   .027
2       1       -.820   .172    Inf     -1.158  -.482   2       1       -.688   .169    -1.023  -.352
3       1       -1.838  .183    Inf     -2.196  -1.480  3       1       -1.550  .188    -1.924  -1.176
4       1       -2.558  .198    Inf     -2.946  -2.170  4       1       -2.168  .230    -2.624  -1.712
1       2       .357    .165    Inf     .034    .681    1       2       .297    .144    .011    .583
2       2       -.895   .167    Inf     -1.223  -.567   2       2       -.745   .165    -1.073  -.417
3       2       -2.607  .192    Inf     -2.984  -2.230  3       2       -2.238  .235    -2.705  -1.772
4       2       -2.891  .201    Inf     -3.285  -2.497  4       2       -2.498  .255    -3.005  -1.992
        Df      SS      MS      F                       Source  F       df1     df2     Sig.
Rab     3       870.17  290.06  290.06                  Rab     66.76   3       94      .000000
Between 1       .12     .12     .12                     Between .38     1       93      .539643
Rab:Between     3       63.20   21.07   21.07                   Rab*Between     5.31    3       94      .001998
                                                        Corrected       30.64   7       101     .000000
Somehow need to tell R about the error structure

NB lmer on logit (FreqPos/Nmax) is not the correct binomial model, because it does not take into account the binomial variability inherent in measuring any binary proportion
see Jaeger, T. F. (2008). Categorical Data Analysis: Away from ANOVAs (transformation or not) and towards Logit Mixed Models. J Mem Lang, 59(4), 434-446.
whats more I have tested that there are differences on 17 empirical data sets



On 27 Feb 2018, at 13:06, r-sig-mixed-models-request at r-project.org<mailto:r-sig-mixed-models-request at r-project.org> wrote:

Send R-sig-mixed-models mailing list submissions to
r-sig-mixed-models at r-project.org<mailto:r-sig-mixed-models at r-project.org>

To subscribe or unsubscribe via the World Wide Web, visit
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
or, via email, send a message with subject or body 'help' to
r-sig-mixed-models-request at r-project.org

You can reach the person managing the list at
r-sig-mixed-models-owner at r-project.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of R-sig-mixed-models digest..."


Today's Topics:

  1. Re: Including random effects creates structure in the
     residuals (Paul Johnson)
  2. Re: means , CIs from lmer, glmer (Ben Bolker)
  3. Re: Including random effects creates structure in the
     residuals (Pierre de Villemereuil)

----------------------------------------------------------------------

Message: 1
Date: Tue, 27 Feb 2018 11:03:17 +0000
From: Paul Johnson <paul.johnson at glasgow.ac.uk>
To: Pierre de Villemereuil <pierre.de.villemereuil at mailoo.org>
Cc: R SIG Mixed Models <r-sig-mixed-models at r-project.org>
Subject: Re: [R-sig-ME] Including random effects creates structure in
the residuals
Message-ID: <64A7142C-55FA-43E8-B457-8C80D8AA1DE4 at glasgow.ac.uk>
Content-Type: text/plain; charset="utf-8"

Hi Pierre,

I don’t think there is a problem with the residuals. Just to check, the problem you see is that there’s a linear trend in the residuals vs fitted values plot when the ID random effect is included (which in a standard OLS LM would be impossible).

The reason for the correlation is that the fitted values contain the ID random effects, and these are inevitably correlated with the residuals. My intuitive understanding of this is as follows. Say some students sit a test twice, on two separate days. A student's score on a given day will be a combination of their ability (ID random effect) and unmeasured (i.e. noise) factors, like how the student was feeling on that day. Assuming that both ability and luck contribute substantially to the scores, it’s inevitable that the extreme upper end of the distribution will be populated by scores from students who are both able (high ID random effect) and were lucky on that day (high error residual). The same goes in the negative direction for the lower end of the distribution. This the basis of is regression to the mean - if we pick a student with an extreme score and re-test them, we expect their score to be less extreme. If I remember correctly it’s fairly straightforward to predict the correlation of the residuals and fitted values for a given model.

On the broader topic of checking residuals from GLMMs…
I wrote a simple function to check residuals from lme4 fits by simulating residuals from the fitted model and plotting them on top of the real residuals. If they look similar on several simulated data sets them I’m reassured that the model fits well. This is particularly useful for non-normal GLMMs where (despite popular belief) there's no assumption of normality of the Pearson residuals.

library(devtools)
install_github("pcdjohnson/GLMMmisc")
library(GLMMmisc)
library(lme4)
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
sim.residplot(fm1)
# note the correlation between the residuals and the fitted values

Florian Hartig has written a more sophisticated package that uses the same basic idea called DHARMa:
https://cran.r-project.org/web/packages/DHARMa/index.html
His blog post:
https://theoreticalecology.wordpress.com/2016/08/28/dharma-an-r-package-for-residual-diagnostics-of-glmms/

All the best,
Paul


On 27 Feb 2018, at 08:53, Pierre de Villemereuil <pierre.de.villemereuil at mailoo.org> wrote:

Dear all,

I have an issue that I can't get my head around. I am working on a human cohort dataset studying heart rate. We have repeated measures at several time points and a model with different slopes according to binned age categories (the variable called "broken" hereafter, for "broken lines").

My issue is that when I include an individual ID effect (to account for the repeated measures), I obtain structured residuals while this is not the case for a model without this effect.

Here are my models:
mod_withID <- lmer(cardfreq ~ sex +
broken +
age:broken +
betabloq +
cafethe +
tabac +
alcool +
(1|visite) +
(1|id),
  data = sub)
mod_noID <- lmer(cardfreq ~ sex +
broken +
age:broken +
betabloq +
cafethe +
tabac +
alcool +
(1|visite),
 data = sub)

The AIC (computed with a fit with REML = FALSE) clearly favours the model including the ID effect:
AIC(mod_withID)
75184.51
AIC(mod_noID)
76942.09

Yet, the model including the ID effect suffers from a bad fit from the residuals point of view (structured residuals) as the plots below show:
- The residuals with the ID effect:
https://ibb.co/b6WsFx
- The residuals without the ID effect:
https://ibb.co/fFVDNc

From this, I gather that the fixed effect part is good enough to provide a good fit, but there is a covariance from the residuals and the BLUPs from the ID effect (I've checked this). Especially, if we marginalise on the random effects to compute the residuals, then everything is fine, suggesting the issue lies in the random rather than the fixed part.

I'm a bit puzzled by this. Why would adding an individual effect would create such a structure in the residual part? Why does this covariance between the individual BLUPs and the residual arise?

I'd happily take anyone's input on this as I'm at a loss regarding what to do to solve this.

Cheers,
Pierre

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models



------------------------------

Message: 2
Date: Tue, 27 Feb 2018 07:14:06 -0500
From: Ben Bolker <bbolker at gmail.com>
To: Rune Haubo <rune.haubo at gmail.com>
Cc: "Kornbrot, Diana" <d.e.kornbrot at herts.ac.uk>, "Paice, Andrew"
<a.paice at herts.ac.uk>, "r-sig-mixed-models at r-project.org"
<r-sig-mixed-models at r-project.org>, "Georgiou, George"
<g.j.georgiou at herts.ac.uk>, "Sullivan, Keith"
<k.sullivan3 at herts.ac.uk>
Subject: Re: [R-sig-ME] means , CIs from lmer, glmer
Message-ID: <c900eade-264f-ca69-af18-2ec0f67d936b at gmail.com>
Content-Type: text/plain; charset="utf-8"


 Yes, that was a thinko on my part.  Thanks.

On 18-02-27 05:22 AM, Rune Haubo wrote:
Just a small note that lmerTest (and the Satterthwaite method for
degrees of freedom) is only meaningful for _linear_ mixed models - not
for the generalized variants such as the one considered here for
proportions.

Best,
Rune

On 27 February 2018 at 02:02, Ben Bolker <bbolker at gmail.com> wrote:
 Hi Diana,

A reproducible example is always helpful/increases your chances of
getting a useful answer ...
It might help if you included the SPSS output (or posted it somewhere
-- note that this list doesn't take HTML-formatted messages nor most
attachments), as many of us don't have access to it.

Look into the (very well-documented) emmeans package:
https://CRAN.R-project.org/package=emmeans
and the lmerTest package (for Satterthwaite df approximations)

On Mon, Feb 26, 2018 at 12:11 PM, Kornbrot, Diana
<d.e.kornbrot at herts.ac.uk> wrote:
I am keen to promote the use of generalised mixed models for the analysis of proportions to psychologists
Have straight fowl code in SPSS [costly] and would like to supply equivalent R Code without ‘tears’
Design is a follows raw frequencies are: FreqPos for ‘success’ and FreqNeg for ‘failure’
Predictors are Rab with 4 levels, repeated over participants and Between with 2 separate groups of participants
Model is binomial with logit link

Require following output to correspond to SPSS output from code below
Descriptive: Means, se and 95% CIs  by Rab, by Between and by Rab*Between
Inferential: fo  Rab, Between and  Rab*Between: F value, MSE, numerator df, denominator df [this enables p-values]

Have tried

logit1 <- glmer(cbind(FreqPos,FreqNeg) ~ Rab + Between + Rab*Between + (1| Participant), family=binomial(link="logit"))
gives F and MSE no denominator df or MSE. Different results to SPSS
nb F=MSE - that can’t be right F is supposed to be ratio of chi-squares

summary (logit1)
gives coefficients  and SEs. Different results to SPSS
also tried predicted and fitted but still no means

have spent days searching internet for examples - but none of them seem to show how to get the output I need

All help greatly appreciated

____
Spss syntax

*Generalized Linear Mixed Models.
GENLINMIXED
 /DATA_STRUCTURE SUBJECTS=Participant REPEATED_MEASURES=Rab COVARIANCE_TYPE=UNSTRUCTURED
 /FIELDS TARGET=FreqPos TRIALS=FIELD(Nmax)  OFFSET=NONE
 /TARGET_OPTIONS DISTRIBUTION=BINOMIAL LINK=LOGIT
 /FIXED  EFFECTS=Rab Between Rab*Between USE_INTERCEPT=TRUE
 /BUILD_OPTIONS TARGET_CATEGORY_ORDER=DESCENDING INPUTS_CATEGORY_ORDER=DESCENDING MAX_ITERATIONS=100 CONFIDENCE_LEVEL=95 DF_METHOD=SATTERTHWAITE COVB=MODEL PCONVERGE=0.000001(ABSOLUTE) SCORING=0 SINGULAR=0.000000000001
 /EMMEANS TABLES=Rab COMPARE=Rab CONTRAST=DEVIATION
  /EMMEANS TABLES=Between CONTRAST=NONE
  /EMMEANS TABLES=Rab*Between CONTRAST=NONE
 /EMMEANS_OPTIONS SCALE=ORIGINAL PADJUST=LSD.

best
Diana


_____________________________________
Professor Diana Kornbrot
Mobile
+44 (0) 7403 18 16 12
Work
University of Hertfordshire
College Lane, Hatfield, Hertfordshire AL10 9AB, UK
+44 (0) 170 728 4626
d.e.kornbrot at herts.ac.uk<mailto:d.e.kornbrot at herts.ac.uk>
http://dianakornbrot.wordpress.com/
http://go.herts.ac.uk/Diana_Kornbrot
skype:  kornbrotme
Home
19 Elmhurst Avenue
London N2 0LT, UK
+44 (0) 208 444 2081
------------------------------------------------------------




       [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models




------------------------------

Message: 3
Date: Tue, 27 Feb 2018 14:06:20 +0100
From: Pierre de Villemereuil <pierre.de.villemereuil at mailoo.org>
To: Paul Johnson <paul.johnson at glasgow.ac.uk>
Cc: R SIG Mixed Models <r-sig-mixed-models at r-project.org>
Subject: Re: [R-sig-ME] Including random effects creates structure in
the residuals
Message-ID: <1554686.l50GCRrqCa at flyosflip>
Content-Type: text/plain; charset="utf-8"

Hi Paul,

Thank you for your response and interest in my question. While I agree that regression to the mean does exist in these settings, I don't get why this should yield such a correlation between the BLUPs and the residuals (after all, assuming the two are totally independent, you'd still get the same phenomenon you're describing, wouldn't you?). Could you explain why this should be the case? Maybe I'm missing a big point in your explanation, if so, please forgive me.

It got me thinking however, that the correlation between the BLUPs and the residuals could arise from a fundamental constraint in the data as you suggested and I think I now understand what is going on (again, if this is what you suggested, please forgive me as I might have misunderstood your point). A short summary is that it arises from an unbalanced design in the repeated measures (as some individuals do not come back to complete the study).

This can be seen in the following graph, which shows the residuals (e) against the BLUPs (u, which also contains the effect of "visit", but it doesn't impact much the trend here), depending on whether we have 1, 2, 3 or 4 repeated measures for that individual:
https://ibb.co/dDgF3H

It should be expected that there is a perfect linear covariation for only 1 visit, because the BLUP and the residual are basically non identifiable, while this constraint is fading as more repeated measures are added to the data. Does this interpretation makes sense to you?

Thank you for your help! Also the bit about checking residuals in GLMMs, very much interesting, I'll think about DARHMa next time I'll have to do this for a GLMM!

Cheers,
Pierre

Le mardi 27 février 2018, 12:03:17 CET Paul Johnson a écrit :
Hi Pierre,

I don’t think there is a problem with the residuals. Just to check, the problem you see is that there’s a linear trend in the residuals vs fitted values plot when the ID random effect is included (which in a standard OLS LM would be impossible).

The reason for the correlation is that the fitted values contain the ID random effects, and these are inevitably correlated with the residuals. My intuitive understanding of this is as follows. Say some students sit a test twice, on two separate days. A student's score on a given day will be a combination of their ability (ID random effect) and unmeasured (i.e. noise) factors, like how the student was feeling on that day. Assuming that both ability and luck contribute substantially to the scores, it’s inevitable that the extreme upper end of the distribution will be populated by scores from students who are both able (high ID random effect) and were lucky on that day (high error residual). The same goes in the negative direction for the lower end of the distribution. This the basis of is regression to the mean - if we pick a student with an extreme score and re-test them, we expect their score to be less extreme. If I remember correctly it’s fairly straightforward to predict the correlation of the residuals and fitted values for a given model.

On the broader topic of checking residuals from GLMMs…
I wrote a simple function to check residuals from lme4 fits by simulating residuals from the fitted model and plotting them on top of the real residuals. If they look similar on several simulated data sets them I’m reassured that the model fits well. This is particularly useful for non-normal GLMMs where (despite popular belief) there's no assumption of normality of the Pearson residuals.

library(devtools)
install_github("pcdjohnson/GLMMmisc")
library(GLMMmisc)
library(lme4)
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
sim.residplot(fm1)
# note the correlation between the residuals and the fitted values

Florian Hartig has written a more sophisticated package that uses the same basic idea called DHARMa:
https://cran.r-project.org/web/packages/DHARMa/index.html
His blog post:
https://theoreticalecology.wordpress.com/2016/08/28/dharma-an-r-package-for-residual-diagnostics-of-glmms/

All the best,
Paul


On 27 Feb 2018, at 08:53, Pierre de Villemereuil <pierre.de.villemereuil at mailoo.org> wrote:

Dear all,

I have an issue that I can't get my head around. I am working on a human cohort dataset studying heart rate. We have repeated measures at several time points and a model with different slopes according to binned age categories (the variable called "broken" hereafter, for "broken lines").

My issue is that when I include an individual ID effect (to account for the repeated measures), I obtain structured residuals while this is not the case for a model without this effect.

Here are my models:
mod_withID <- lmer(cardfreq ~ sex +
broken +
age:broken +
betabloq +
cafethe +
tabac +
alcool +
(1|visite) +
(1|id),
  data = sub)
mod_noID <- lmer(cardfreq ~ sex +
broken +
age:broken +
betabloq +
cafethe +
tabac +
alcool +
(1|visite),
 data = sub)

The AIC (computed with a fit with REML = FALSE) clearly favours the model including the ID effect:
AIC(mod_withID)
75184.51
AIC(mod_noID)
76942.09

Yet, the model including the ID effect suffers from a bad fit from the residuals point of view (structured residuals) as the plots below show:
- The residuals with the ID effect:
https://ibb.co/b6WsFx
- The residuals without the ID effect:
https://ibb.co/fFVDNc

From this, I gather that the fixed effect part is good enough to provide a good fit, but there is a covariance from the residuals and the BLUPs from the ID effect (I've checked this). Especially, if we marginalise on the random effects to compute the residuals, then everything is fine, suggesting the issue lies in the random rather than the fixed part.

I'm a bit puzzled by this. Why would adding an individual effect would create such a structure in the residual part? Why does this covariance between the individual BLUPs and the residual arise?

I'd happily take anyone's input on this as I'm at a loss regarding what to do to solve this.

Cheers,
Pierre

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models



------------------------------

Subject: Digest Footer

_______________________________________________
R-sig-mixed-models mailing list
R-sig-mixed-models at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models


------------------------------

End of R-sig-mixed-models Digest, Vol 134, Issue 42
***************************************************

_____________________________________
Professor Diana Kornbrot
Mobile
+44 (0) 7403 18 16 12
Work
University of Hertfordshire
College Lane, Hatfield, Hertfordshire AL10 9AB, UK
+44 (0) 170 728 4626
d.e.kornbrot at herts.ac.uk<mailto:d.e.kornbrot at herts.ac.uk>
http://dianakornbrot.wordpress.com/
http://go.herts.ac.uk/Diana_Kornbrot
skype:  kornbrotme
Home
19 Elmhurst Avenue
London N2 0LT, UK
+44 (0) 208 444 2081
 ------------------------------------------------------------




	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list