[R-sig-ME] diverging results with and without random effects

Mon Nov 26 16:36:52 CET 2018

Dear Andreas,

This is due to quasi complete separatation. This occurs when all responses
for a specific combination of levels are always TRUE or FALSE. In your
case, you have only two observations per patient. Hence adding the patient
as random effect, guarantees quasi complete separation issues.

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx using inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

<https://www.inbo.be>

Op ma 26 nov. 2018 om 13:48 schreef Leha, Andreas <
andreas.leha using med.uni-goettingen.de>:

> Hi all,
>
> sent the wrong code (w/o filtering for BL).  If you want to look at the
> data, please use this code:
>
> ---------- cut here --------------------------------------------
> library("dplyr")
> library("lme4")
> library("lmerTest")
> ## install_github("hrbrmstr/pastebin", upgrade_dependencies = FALSE)
> library("pastebin")
>
> ## ---------------------------------- ##
> ## load the data                      ##
> ## ---------------------------------- ##
> dat <- pastebin::get_paste("Xgwgtb7j") %>% as.character %>% gsub("\r\n",
> "", .) %>% parse(text = .) %>% eval
>
>
>
> ## ---------------------------------- ##
> ## have a look                        ##
> ## ---------------------------------- ##
> dat
> ## ,----
> ## | # A tibble: 475 x 4
> ## |    patient group fu    riskfactor
> ## |    <fct>   <fct> <fct> <fct>
> ## |  1 p001    wt    BL    norisk
> ## |  2 p002    wt    BL    norisk
> ## |  3 p003    wt    BL    norisk
> ## |  4 p004    wt    BL    norisk
> ## |  5 p005    wt    BL    norisk
> ## |  6 p006    wt    BL    norisk
> ## |  7 p007    wt    BL    norisk
> ## |  8 p008    wt    BL    norisk
> ## |  9 p009    wt    BL    risk
> ## | 10 p010    wt    BL    norisk
> ## | # ... with 465 more rows
> ## `----
> dat %>% str
> ## ,----
> ## | Classes ‘tbl_df’, ‘tbl’ and 'data.frame':  475 obs. of  4 variables:
> ## |  $ patient   : Factor w/ 265 levels "p001","p002",..: 1 2 3 4 5 6 7
> 8 9 10 ...
> ## |  $ group     : Factor w/ 2 levels "wt","mut": 1 1 1 1 1 1 1 1 1 1 ...
> ## |  $ fu        : Factor w/ 2 levels "BL","FU": 1 1 1 1 1 1 1 1 1 1 ...
> ## |  $ riskfactor: Factor w/ 2 levels "risk","norisk": 2 2 2 2 2 2 2 2
> 1 2 ...
> ## `----
>
> ## there are 265 patients
> ## in 2 groups: "wt" and "mut"
> ## with a dichotomous risk factor ("risk" and "norisk")
> ## measured at two time points ("BL" and "FU")
>
> dat %>% summary
> ## ,----
> ## |     patient    group      fu       riskfactor
> ## |  p001   :  2   wt :209   BL:258   risk  :205
> ## |  p002   :  2   mut:266   FU:217   norisk:270
> ## |  p003   :  2
> ## |  p004   :  2
> ## |  p005   :  2
> ## |  p006   :  2
> ## |  (Other):463
> ## `----
>
> ## group sizes seem fine
>
>
>
> ## ---------------------------------------------- ##
> ## first, we look at the first time point, the BL ##
> ## ---------------------------------------------- ##
>
> ## we build a cross table
> tab_bl <-
>   dat %>%
>   dplyr::filter(fu == "BL") %>%
>   dplyr::select(group, riskfactor) %>%
>   table
> tab_bl
> ## ,----
> ## |      riskfactor
> ## | group risk norisk
> ## |   wt    22     86
> ## |   mut   87     63
> ## `----
>
> ## and we test using fisher:
> tab_bl %>% fisher.test
> ## ,----
> ## |    Fisher's Exact Test for Count Data
> ## |
> ## | data:  .
> ## | p-value = 1.18e-09
> ## | alternative hypothesis: true odds ratio is not equal to 1
> ## | 95 percent confidence interval:
> ## |  0.09986548 0.33817966
> ## | sample estimates:
> ## | odds ratio
> ## |  0.1865377
> ## `----
> log(0.187)
> ## ,----
> ## | [1] -1.676647
> ## `----
>
> ## so, we get a highly significant association of the riskfactor
> ## and the group with an log(odds ratio) of -1.7
>
> ## we get the same result using logistic regression:
> dat %>%
>   filter(fu == "BL") %>%
>   glm(group ~ riskfactor, family = "binomial", data = .) %>%
>   summary
> ## ,----
> ## | Call:
> ## | glm(formula = group ~ riskfactor, family = "binomial", data = .)
> ## |
> ## | Deviance Residuals:
> ## |     Min       1Q   Median       3Q      Max
> ## | -1.7890  -1.0484   0.6715   0.6715   1.3121
> ## |
> ## | Coefficients:
> ## |                  Estimate Std. Error z value Pr(>|z|)
> ## | (Intercept)        1.3749     0.2386   5.761 8.35e-09 ***
> ## | riskfactornorisk  -1.6861     0.2906  -5.802 6.55e-09 ***
> ## | ---
> ## | Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> ## |
> ## | (Dispersion parameter for binomial family taken to be 1)
> ## |
> ## |     Null deviance: 350.80  on 257  degrees of freedom
> ## | Residual deviance: 312.63  on 256  degrees of freedom
> ## | AIC: 316.63
> ## |
> ## | Number of Fisher Scoring iterations: 4
> ## `----
>
>
>
> ## ------------------------------------------------- ##
> ## Now, we analyse both time points with interaction ##
> ## ------------------------------------------------- ##
>
> dat %>%
>   glmer(group ~ riskfactor + fu + riskfactor:fu + (1|patient), family =
> "binomial", data = .) %>%
>   summary
> ## ,----
> ## | Generalized linear mixed model fit by maximum likelihood (Laplace
> ## |   Approximation) [glmerMod]
> ## |  Family: binomial  ( logit )
> ## | Formula: group ~ riskfactor + fu + riskfactor:fu + (1 | patient)
> ## |    Data: .
> ## |
> ## |      AIC      BIC   logLik deviance df.resid
> ## |    345.2    366.0   -167.6    335.2      470
> ## |
> ## | Scaled residuals:
> ## |       Min        1Q    Median        3Q       Max
> ## | -0.095863 -0.058669  0.002278  0.002866  0.007324
> ## |
> ## | Random effects:
> ## |  Groups  Name        Variance Std.Dev.
> ## |  patient (Intercept) 1849     43
> ## | Number of obs: 475, groups:  patient, 265
> ## |
> ## | Fixed effects:
> ## |                       Estimate Std. Error z value Pr(>|z|)
> ## | (Intercept)            11.6846     1.3736   8.507   <2e-16 ***
> ## | riskfactornorisk       -1.5919     1.4166  -1.124    0.261
> ## | fuFU                    0.4596     1.9165   0.240    0.810
> ## | riskfactornorisk:fuFU  -0.8183     2.1651  -0.378    0.705
> ## | ---
> ## | Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> ## |
> ## | Correlation of Fixed Effects:
> ## |             (Intr) rskfct fuFU
> ## | rskfctrnrsk -0.746
> ## | fuFU        -0.513  0.510
> ## | rskfctrn:FU  0.478 -0.576 -0.908
> ## `----
>
> ## I get huge variation in the random effects
> ##
> ## And the risk factor at BL gets an estimated log(odds ratio) of -1.6
> ## but one which is not significant
> ---------- cut here --------------------------------------------
>
>
> On 26/11/18 12:10, Leha, Andreas wrote:
> > Hi all,
> >
> > I am interested in assessing the association of a (potential) risk
> > factor to a (binary) grouping.
> >
> > I am having trouble with diverging results from modeling one time point
> > (without random effect) and modeling two time points (with random
> effect).
> >
> > When analysing the first time point (base line, BL) only, I get a highly
> > significant association.
> > Now, I want to see, whether there is an interaction between time and
> > risk factor (the risk factor is not constant).  But when analysing both
> > time points, the estimated effect at BL is estimated to be not
> significant.
> >
> > Now my simplified questions are:
> > (1) Is there an association at BL or not?
> > (2) How should I analyse both time points with this data?
> >
> > The aim is to look for confounding with other factors.  But I'd like to
> > understand the simple models before moving on.
> >
> > Below you find a reproducible example and the detailed results.
> >
> > Any suggestions would be highly appreciated!
> >
> > Regards,
> > Andreas
> >
> >
> >
> > PS: The code / results
> >
> > ---------- cut here --------------------------------------------
> > library("dplyr")
> > library("lme4")
> > library("lmerTest")
> > ## install_github("hrbrmstr/pastebin", upgrade_dependencies = FALSE)
> > library("pastebin")
> >
> > ## ---------------------------------- ##
> > ## load the data                      ##
> > ## ---------------------------------- ##
> > dat <- pastebin::get_paste("Xgwgtb7j") %>%
> >   as.character %>%
> >   gsub("\r\n", "", .) %>%
> >   parse(text = .) %>%
> >   eval
> >
> >
> >
> > ## ---------------------------------- ##
> > ## have a look                        ##
> > ## ---------------------------------- ##
> > dat
> > ## ,----
> > ## | # A tibble: 475 x 4
> > ## |    patient group fu    riskfactor
> > ## |    <fct>   <fct> <fct> <fct>
> > ## |  1 p001    wt    BL    norisk
> > ## |  2 p002    wt    BL    norisk
> > ## |  3 p003    wt    BL    norisk
> > ## |  4 p004    wt    BL    norisk
> > ## |  5 p005    wt    BL    norisk
> > ## |  6 p006    wt    BL    norisk
> > ## |  7 p007    wt    BL    norisk
> > ## |  8 p008    wt    BL    norisk
> > ## |  9 p009    wt    BL    risk
> > ## | 10 p010    wt    BL    norisk
> > ## | # ... with 465 more rows
> > ## `----
> > dat %>% str
> > ## ,----
> > ## | Classes ‘tbl_df’, ‘tbl’ and 'data.frame':        475 obs. of  4
> variables:
> > ## |  $ patient   : Factor w/ 265 levels "p001","p002",..: 1 2 3 4 5 6 7
> > 8 9 10 ...
> > ## |  $ group     : Factor w/ 2 levels "wt","mut": 1 1 1 1 1 1 1 1 1 1
> ...
> > ## |  $ fu        : Factor w/ 2 levels "BL","FU": 1 1 1 1 1 1 1 1 1 1 ...
> > ## |  $ riskfactor: Factor w/ 2 levels "risk","norisk": 2 2 2 2 2 2 2 2
> > 1 2 ...
> > ## `----
> >
> > ## there are 265 patients
> > ## in 2 groups: "wt" and "mut"
> > ## with a dichotomous risk factor ("risk" and "norisk")
> > ## measured at two time points ("BL" and "FU")
> >
> > dat %>% summary
> > ## ,----
> > ## |     patient    group      fu       riskfactor
> > ## |  p001   :  2   wt :209   BL:258   risk  :205
> > ## |  p002   :  2   mut:266   FU:217   norisk:270
> > ## |  p003   :  2
> > ## |  p004   :  2
> > ## |  p005   :  2
> > ## |  p006   :  2
> > ## |  (Other):463
> > ## `----
> >
> > ## group sizes seem fine
> >
> >
> >
> > ## ---------------------------------------------- ##
> > ## first, we look at the first time point, the BL ##
> > ## ---------------------------------------------- ##
> >
> > ## we build a cross table
> > tab_bl <-
> >   dat %>%
> >   dplyr::select(group, riskfactor) %>%
> >   table
> > tab_bl
> > ## ,----
> > ## |      riskfactor
> > ## | group risk norisk
> > ## |   wt    35    174
> > ## |   mut  170     96
> > ## `----
> >
> > ## and we test using fisher:
> > tab_bl %>% fisher.test
> > ## ,----
> > ## |    Fisher's Exact Test for Count Data
> > ## |
> > ## | data:  .
> > ## | p-value < 2.2e-16
> > ## | alternative hypothesis: true odds ratio is not equal to 1
> > ## | 95 percent confidence interval:
> > ## |  0.07099792 0.18002325
> > ## | sample estimates:
> > ## | odds ratio
> > ## |  0.1141677
> > ## `----
> > log(0.114)
> > ## ,----
> > ## | [1] -2.171557
> > ## `----
> >
> > ## so, we get a highly significant association of the riskfactor
> > ## and the group with an log(odds ratio) of -2.2
> >
> > ## we get the same result using logistic regression:
> > dat %>%
> >   glm(group ~ riskfactor, family = "binomial", data = .) %>%
> >   summary
> > ## ,----
> > ## |
> > ## | Call:
> > ## | glm(formula = group ~ riskfactor, family = "binomial", data = .)
> > ## |
> > ## | Deviance Residuals:
> > ## |     Min       1Q   Median       3Q      Max
> > ## | -1.8802  -0.9374   0.6119   0.6119   1.4381
> > ## |
> > ## | Coefficients:
> > ## |                  Estimate Std. Error z value Pr(>|z|)
> > ## | (Intercept)        1.5805     0.1856   8.515   <2e-16 ***
> > ## | riskfactornorisk  -2.1752     0.2250  -9.668   <2e-16 ***
> > ## | ---
> > ## | Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> > ## |
> > ## | (Dispersion parameter for binomial family taken to be 1)
> > ## |
> > ## |     Null deviance: 651.63  on 474  degrees of freedom
> > ## | Residual deviance: 538.83  on 473  degrees of freedom
> > ## | AIC: 542.83
> > ## |
> > ## | Number of Fisher Scoring iterations: 4
> > ## `----
> >
> >
> >
> > ## ------------------------------------------------- ##
> > ## Now, we analyse both time points with interaction ##
> > ## ------------------------------------------------- ##
> >
> > dat %>%
> >   glmer(group ~ riskfactor + fu + riskfactor:fu + (1|patient), family =
> > "binomial", data = .) %>%
> >   summary
> > ## ,----
> > ## | Generalized linear mixed model fit by maximum likelihood (Laplace
> > ## |   Approximation) [glmerMod]
> > ## |  Family: binomial  ( logit )
> > ## | Formula: group ~ riskfactor + fu + riskfactor:fu + (1 | patient)
> > ## |    Data: .
> > ## |
> > ## |      AIC      BIC   logLik deviance df.resid
> > ## |    345.2    366.0   -167.6    335.2      470
> > ## |
> > ## | Scaled residuals:
> > ## |       Min        1Q    Median        3Q       Max
> > ## | -0.095863 -0.058669  0.002278  0.002866  0.007324
> > ## |
> > ## | Random effects:
> > ## |  Groups  Name        Variance Std.Dev.
> > ## |  patient (Intercept) 1849     43
> > ## | Number of obs: 475, groups:  patient, 265
> > ## |
> > ## | Fixed effects:
> > ## |                       Estimate Std. Error z value Pr(>|z|)
> > ## | (Intercept)            11.6846     1.3736   8.507   <2e-16 ***
> > ## | riskfactornorisk       -1.5919     1.4166  -1.124    0.261
> > ## | fuFU                    0.4596     1.9165   0.240    0.810
> > ## | riskfactornorisk:fuFU  -0.8183     2.1651  -0.378    0.705
> > ## | ---
> > ## | Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> > ## |
> > ## | Correlation of Fixed Effects:
> > ## |             (Intr) rskfct fuFU
> > ## | rskfctrnrsk -0.746
> > ## | fuFU        -0.513  0.510
> > ## | rskfctrn:FU  0.478 -0.576 -0.908
> > ## `----
> >
> > ## I get huge variation in the random effects
> > ##
> > ## And the risk factor at BL gets an estimated log(odds ratio) of -1.6
> > ## but one which is not significant
> > ---------- cut here --------------------------------------------
> > _______________________________________________
> > R-sig-mixed-models using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
>
> --
> Dr. Andreas Leha
> Head of the 'Core Facility
> Medical Biometry and Statistical Bioinformatics'
>
> UNIVERSITY MEDICAL CENTER GÖTTINGEN
> GEORG-AUGUST-UNIVERSITÄT
> Department of Medical Statistics
> Humboldtallee 32
> 37073 Göttingen
> Mailing Address: 37099 Göttingen, Germany
> Fax: +49 (0) 551 39-4995
> Tel: +49 (0) 551 39-4987
> http://www.ams.med.uni-goettingen.de/service-de.shtml
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]