[R-sig-ME] diverging results with and without random effects

Mon Nov 26 13:47:57 CET 2018

Hi all,

sent the wrong code (w/o filtering for BL).  If you want to look at the
data, please use this code:

---------- cut here --------------------------------------------
library("dplyr")
library("lme4")
library("lmerTest")
## install_github("hrbrmstr/pastebin", upgrade_dependencies = FALSE)
library("pastebin")

## ---------------------------------- ##
## load the data                      ##
## ---------------------------------- ##
dat <- pastebin::get_paste("Xgwgtb7j") %>% as.character %>% gsub("\r\n",
"", .) %>% parse(text = .) %>% eval

## ---------------------------------- ##
## have a look                        ##
## ---------------------------------- ##
dat
## ,----
## | # A tibble: 475 x 4
## |    patient group fu    riskfactor
## |    <fct>   <fct> <fct> <fct>
## |  1 p001    wt    BL    norisk
## |  2 p002    wt    BL    norisk
## |  3 p003    wt    BL    norisk
## |  4 p004    wt    BL    norisk
## |  5 p005    wt    BL    norisk
## |  6 p006    wt    BL    norisk
## |  7 p007    wt    BL    norisk
## |  8 p008    wt    BL    norisk
## |  9 p009    wt    BL    risk
## | 10 p010    wt    BL    norisk
## | # ... with 465 more rows
## `----
dat %>% str
## ,----
## | Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	475 obs. of  4 variables:
## |  $ patient   : Factor w/ 265 levels "p001","p002",..: 1 2 3 4 5 6 7
8 9 10 ...
## |  $ group     : Factor w/ 2 levels "wt","mut": 1 1 1 1 1 1 1 1 1 1 ...
## |  $ fu        : Factor w/ 2 levels "BL","FU": 1 1 1 1 1 1 1 1 1 1 ...
## |  $ riskfactor: Factor w/ 2 levels "risk","norisk": 2 2 2 2 2 2 2 2
1 2 ...
## `----

## there are 265 patients
## in 2 groups: "wt" and "mut"
## with a dichotomous risk factor ("risk" and "norisk")
## measured at two time points ("BL" and "FU")

dat %>% summary
## ,----
## |     patient    group      fu       riskfactor
## |  p001   :  2   wt :209   BL:258   risk  :205
## |  p002   :  2   mut:266   FU:217   norisk:270
## |  p003   :  2
## |  p004   :  2
## |  p005   :  2
## |  p006   :  2
## |  (Other):463
## `----

## group sizes seem fine

## ---------------------------------------------- ##
## first, we look at the first time point, the BL ##
## ---------------------------------------------- ##

## we build a cross table
tab_bl <-
  dat %>%
  dplyr::filter(fu == "BL") %>%
  dplyr::select(group, riskfactor) %>%
  table
tab_bl
## ,----
## |      riskfactor
## | group risk norisk
## |   wt    22     86
## |   mut   87     63
## `----

## and we test using fisher:
tab_bl %>% fisher.test
## ,----
## | 	Fisher's Exact Test for Count Data
## |
## | data:  .
## | p-value = 1.18e-09
## | alternative hypothesis: true odds ratio is not equal to 1
## | 95 percent confidence interval:
## |  0.09986548 0.33817966
## | sample estimates:
## | odds ratio
## |  0.1865377
## `----
log(0.187)
## ,----
## | [1] -1.676647
## `----

## so, we get a highly significant association of the riskfactor
## and the group with an log(odds ratio) of -1.7

## we get the same result using logistic regression:
dat %>%
  filter(fu == "BL") %>%
  glm(group ~ riskfactor, family = "binomial", data = .) %>%
  summary
## ,----
## | Call:
## | glm(formula = group ~ riskfactor, family = "binomial", data = .)
## |
## | Deviance Residuals:
## |     Min       1Q   Median       3Q      Max
## | -1.7890  -1.0484   0.6715   0.6715   1.3121
## |
## | Coefficients:
## |                  Estimate Std. Error z value Pr(>|z|)
## | (Intercept)        1.3749     0.2386   5.761 8.35e-09 ***
## | riskfactornorisk  -1.6861     0.2906  -5.802 6.55e-09 ***
## | ---
## | Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## |
## | (Dispersion parameter for binomial family taken to be 1)
## |
## |     Null deviance: 350.80  on 257  degrees of freedom
## | Residual deviance: 312.63  on 256  degrees of freedom
## | AIC: 316.63
## |
## | Number of Fisher Scoring iterations: 4
## `----

## ------------------------------------------------- ##
## Now, we analyse both time points with interaction ##
## ------------------------------------------------- ##

dat %>%
  glmer(group ~ riskfactor + fu + riskfactor:fu + (1|patient), family =
"binomial", data = .) %>%
  summary
## ,----
## | Generalized linear mixed model fit by maximum likelihood (Laplace
## |   Approximation) [glmerMod]
## |  Family: binomial  ( logit )
## | Formula: group ~ riskfactor + fu + riskfactor:fu + (1 | patient)
## |    Data: .
## |
## |      AIC      BIC   logLik deviance df.resid
## |    345.2    366.0   -167.6    335.2      470
## |
## | Scaled residuals:
## |       Min        1Q    Median        3Q       Max
## | -0.095863 -0.058669  0.002278  0.002866  0.007324
## |
## | Random effects:
## |  Groups  Name        Variance Std.Dev.
## |  patient (Intercept) 1849     43
## | Number of obs: 475, groups:  patient, 265
## |
## | Fixed effects:
## |                       Estimate Std. Error z value Pr(>|z|)
## | (Intercept)            11.6846     1.3736   8.507   <2e-16 ***
## | riskfactornorisk       -1.5919     1.4166  -1.124    0.261
## | fuFU                    0.4596     1.9165   0.240    0.810
## | riskfactornorisk:fuFU  -0.8183     2.1651  -0.378    0.705
## | ---
## | Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## |
## | Correlation of Fixed Effects:
## |             (Intr) rskfct fuFU
## | rskfctrnrsk -0.746
## | fuFU        -0.513  0.510
## | rskfctrn:FU  0.478 -0.576 -0.908
## `----

## I get huge variation in the random effects
##
## And the risk factor at BL gets an estimated log(odds ratio) of -1.6
## but one which is not significant
---------- cut here --------------------------------------------

On 26/11/18 12:10, Leha, Andreas wrote:
> Hi all,
> 
> I am interested in assessing the association of a (potential) risk
> factor to a (binary) grouping.
> 
> I am having trouble with diverging results from modeling one time point
> (without random effect) and modeling two time points (with random effect).
> 
> When analysing the first time point (base line, BL) only, I get a highly
> significant association.
> Now, I want to see, whether there is an interaction between time and
> risk factor (the risk factor is not constant).  But when analysing both
> time points, the estimated effect at BL is estimated to be not significant.
> 
> Now my simplified questions are:
> (1) Is there an association at BL or not?
> (2) How should I analyse both time points with this data?
> 
> The aim is to look for confounding with other factors.  But I'd like to
> understand the simple models before moving on.
> 
> Below you find a reproducible example and the detailed results.
> 
> Any suggestions would be highly appreciated!
> 
> Regards,
> Andreas
> 
> 
> 
> PS: The code / results
> 
> ---------- cut here --------------------------------------------
> library("dplyr")
> library("lme4")
> library("lmerTest")
> ## install_github("hrbrmstr/pastebin", upgrade_dependencies = FALSE)
> library("pastebin")
> 
> ## ---------------------------------- ##
> ## load the data                      ##
> ## ---------------------------------- ##
> dat <- pastebin::get_paste("Xgwgtb7j") %>%
>   as.character %>%
>   gsub("\r\n", "", .) %>%
>   parse(text = .) %>%
>   eval
> 
> 
> 
> ## ---------------------------------- ##
> ## have a look                        ##
> ## ---------------------------------- ##
> dat
> ## ,----
> ## | # A tibble: 475 x 4
> ## |    patient group fu    riskfactor
> ## |    <fct>   <fct> <fct> <fct>
> ## |  1 p001    wt    BL    norisk
> ## |  2 p002    wt    BL    norisk
> ## |  3 p003    wt    BL    norisk
> ## |  4 p004    wt    BL    norisk
> ## |  5 p005    wt    BL    norisk
> ## |  6 p006    wt    BL    norisk
> ## |  7 p007    wt    BL    norisk
> ## |  8 p008    wt    BL    norisk
> ## |  9 p009    wt    BL    risk
> ## | 10 p010    wt    BL    norisk
> ## | # ... with 465 more rows
> ## `----
> dat %>% str
> ## ,----
> ## | Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	475 obs. of  4 variables:
> ## |  $ patient   : Factor w/ 265 levels "p001","p002",..: 1 2 3 4 5 6 7
> 8 9 10 ...
> ## |  $ group     : Factor w/ 2 levels "wt","mut": 1 1 1 1 1 1 1 1 1 1 ...
> ## |  $ fu        : Factor w/ 2 levels "BL","FU": 1 1 1 1 1 1 1 1 1 1 ...
> ## |  $ riskfactor: Factor w/ 2 levels "risk","norisk": 2 2 2 2 2 2 2 2
> 1 2 ...
> ## `----
> 
> ## there are 265 patients
> ## in 2 groups: "wt" and "mut"
> ## with a dichotomous risk factor ("risk" and "norisk")
> ## measured at two time points ("BL" and "FU")
> 
> dat %>% summary
> ## ,----
> ## |     patient    group      fu       riskfactor
> ## |  p001   :  2   wt :209   BL:258   risk  :205
> ## |  p002   :  2   mut:266   FU:217   norisk:270
> ## |  p003   :  2
> ## |  p004   :  2
> ## |  p005   :  2
> ## |  p006   :  2
> ## |  (Other):463
> ## `----
> 
> ## group sizes seem fine
> 
> 
> 
> ## ---------------------------------------------- ##
> ## first, we look at the first time point, the BL ##
> ## ---------------------------------------------- ##
> 
> ## we build a cross table
> tab_bl <-
>   dat %>%
>   dplyr::select(group, riskfactor) %>%
>   table
> tab_bl
> ## ,----
> ## |      riskfactor
> ## | group risk norisk
> ## |   wt    35    174
> ## |   mut  170     96
> ## `----
> 
> ## and we test using fisher:
> tab_bl %>% fisher.test
> ## ,----
> ## |    Fisher's Exact Test for Count Data
> ## |
> ## | data:  .
> ## | p-value < 2.2e-16
> ## | alternative hypothesis: true odds ratio is not equal to 1
> ## | 95 percent confidence interval:
> ## |  0.07099792 0.18002325
> ## | sample estimates:
> ## | odds ratio
> ## |  0.1141677
> ## `----
> log(0.114)
> ## ,----
> ## | [1] -2.171557
> ## `----
> 
> ## so, we get a highly significant association of the riskfactor
> ## and the group with an log(odds ratio) of -2.2
> 
> ## we get the same result using logistic regression:
> dat %>%
>   glm(group ~ riskfactor, family = "binomial", data = .) %>%
>   summary
> ## ,----
> ## |
> ## | Call:
> ## | glm(formula = group ~ riskfactor, family = "binomial", data = .)
> ## |
> ## | Deviance Residuals:
> ## |     Min       1Q   Median       3Q      Max
> ## | -1.8802  -0.9374   0.6119   0.6119   1.4381
> ## |
> ## | Coefficients:
> ## |                  Estimate Std. Error z value Pr(>|z|)
> ## | (Intercept)        1.5805     0.1856   8.515   <2e-16 ***
> ## | riskfactornorisk  -2.1752     0.2250  -9.668   <2e-16 ***
> ## | ---
> ## | Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> ## |
> ## | (Dispersion parameter for binomial family taken to be 1)
> ## |
> ## |     Null deviance: 651.63  on 474  degrees of freedom
> ## | Residual deviance: 538.83  on 473  degrees of freedom
> ## | AIC: 542.83
> ## |
> ## | Number of Fisher Scoring iterations: 4
> ## `----
> 
> 
> 
> ## ------------------------------------------------- ##
> ## Now, we analyse both time points with interaction ##
> ## ------------------------------------------------- ##
> 
> dat %>%
>   glmer(group ~ riskfactor + fu + riskfactor:fu + (1|patient), family =
> "binomial", data = .) %>%
>   summary
> ## ,----
> ## | Generalized linear mixed model fit by maximum likelihood (Laplace
> ## |   Approximation) [glmerMod]
> ## |  Family: binomial  ( logit )
> ## | Formula: group ~ riskfactor + fu + riskfactor:fu + (1 | patient)
> ## |    Data: .
> ## |
> ## |      AIC      BIC   logLik deviance df.resid
> ## |    345.2    366.0   -167.6    335.2      470
> ## |
> ## | Scaled residuals:
> ## |       Min        1Q    Median        3Q       Max
> ## | -0.095863 -0.058669  0.002278  0.002866  0.007324
> ## |
> ## | Random effects:
> ## |  Groups  Name        Variance Std.Dev.
> ## |  patient (Intercept) 1849     43
> ## | Number of obs: 475, groups:  patient, 265
> ## |
> ## | Fixed effects:
> ## |                       Estimate Std. Error z value Pr(>|z|)
> ## | (Intercept)            11.6846     1.3736   8.507   <2e-16 ***
> ## | riskfactornorisk       -1.5919     1.4166  -1.124    0.261
> ## | fuFU                    0.4596     1.9165   0.240    0.810
> ## | riskfactornorisk:fuFU  -0.8183     2.1651  -0.378    0.705
> ## | ---
> ## | Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> ## |
> ## | Correlation of Fixed Effects:
> ## |             (Intr) rskfct fuFU
> ## | rskfctrnrsk -0.746
> ## | fuFU        -0.513  0.510
> ## | rskfctrn:FU  0.478 -0.576 -0.908
> ## `----
> 
> ## I get huge variation in the random effects
> ##
> ## And the risk factor at BL gets an estimated log(odds ratio) of -1.6
> ## but one which is not significant
> ---------- cut here --------------------------------------------
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 

-- 
Dr. Andreas Leha
Head of the 'Core Facility
Medical Biometry and Statistical Bioinformatics'

UNIVERSITY MEDICAL CENTER GÖTTINGEN
GEORG-AUGUST-UNIVERSITÄT
Department of Medical Statistics
Humboldtallee 32
37073 Göttingen
Mailing Address: 37099 Göttingen, Germany
Fax: +49 (0) 551 39-4995
Tel: +49 (0) 551 39-4987
http://www.ams.med.uni-goettingen.de/service-de.shtml