[R-sig-ME] diverging results with and without random effects
Leha, Andreas
@ndre@@@leh@ @ending from med@uni-goettingen@de
Mon Nov 26 13:47:57 CET 2018
Hi all,
sent the wrong code (w/o filtering for BL). If you want to look at the
data, please use this code:
---------- cut here --------------------------------------------
library("dplyr")
library("lme4")
library("lmerTest")
## install_github("hrbrmstr/pastebin", upgrade_dependencies = FALSE)
library("pastebin")
## ---------------------------------- ##
## load the data ##
## ---------------------------------- ##
dat <- pastebin::get_paste("Xgwgtb7j") %>% as.character %>% gsub("\r\n",
"", .) %>% parse(text = .) %>% eval
## ---------------------------------- ##
## have a look ##
## ---------------------------------- ##
dat
## ,----
## | # A tibble: 475 x 4
## | patient group fu riskfactor
## | <fct> <fct> <fct> <fct>
## | 1 p001 wt BL norisk
## | 2 p002 wt BL norisk
## | 3 p003 wt BL norisk
## | 4 p004 wt BL norisk
## | 5 p005 wt BL norisk
## | 6 p006 wt BL norisk
## | 7 p007 wt BL norisk
## | 8 p008 wt BL norisk
## | 9 p009 wt BL risk
## | 10 p010 wt BL norisk
## | # ... with 465 more rows
## `----
dat %>% str
## ,----
## | Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 475 obs. of 4 variables:
## | $ patient : Factor w/ 265 levels "p001","p002",..: 1 2 3 4 5 6 7
8 9 10 ...
## | $ group : Factor w/ 2 levels "wt","mut": 1 1 1 1 1 1 1 1 1 1 ...
## | $ fu : Factor w/ 2 levels "BL","FU": 1 1 1 1 1 1 1 1 1 1 ...
## | $ riskfactor: Factor w/ 2 levels "risk","norisk": 2 2 2 2 2 2 2 2
1 2 ...
## `----
## there are 265 patients
## in 2 groups: "wt" and "mut"
## with a dichotomous risk factor ("risk" and "norisk")
## measured at two time points ("BL" and "FU")
dat %>% summary
## ,----
## | patient group fu riskfactor
## | p001 : 2 wt :209 BL:258 risk :205
## | p002 : 2 mut:266 FU:217 norisk:270
## | p003 : 2
## | p004 : 2
## | p005 : 2
## | p006 : 2
## | (Other):463
## `----
## group sizes seem fine
## ---------------------------------------------- ##
## first, we look at the first time point, the BL ##
## ---------------------------------------------- ##
## we build a cross table
tab_bl <-
dat %>%
dplyr::filter(fu == "BL") %>%
dplyr::select(group, riskfactor) %>%
table
tab_bl
## ,----
## | riskfactor
## | group risk norisk
## | wt 22 86
## | mut 87 63
## `----
## and we test using fisher:
tab_bl %>% fisher.test
## ,----
## | Fisher's Exact Test for Count Data
## |
## | data: .
## | p-value = 1.18e-09
## | alternative hypothesis: true odds ratio is not equal to 1
## | 95 percent confidence interval:
## | 0.09986548 0.33817966
## | sample estimates:
## | odds ratio
## | 0.1865377
## `----
log(0.187)
## ,----
## | [1] -1.676647
## `----
## so, we get a highly significant association of the riskfactor
## and the group with an log(odds ratio) of -1.7
## we get the same result using logistic regression:
dat %>%
filter(fu == "BL") %>%
glm(group ~ riskfactor, family = "binomial", data = .) %>%
summary
## ,----
## | Call:
## | glm(formula = group ~ riskfactor, family = "binomial", data = .)
## |
## | Deviance Residuals:
## | Min 1Q Median 3Q Max
## | -1.7890 -1.0484 0.6715 0.6715 1.3121
## |
## | Coefficients:
## | Estimate Std. Error z value Pr(>|z|)
## | (Intercept) 1.3749 0.2386 5.761 8.35e-09 ***
## | riskfactornorisk -1.6861 0.2906 -5.802 6.55e-09 ***
## | ---
## | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## |
## | (Dispersion parameter for binomial family taken to be 1)
## |
## | Null deviance: 350.80 on 257 degrees of freedom
## | Residual deviance: 312.63 on 256 degrees of freedom
## | AIC: 316.63
## |
## | Number of Fisher Scoring iterations: 4
## `----
## ------------------------------------------------- ##
## Now, we analyse both time points with interaction ##
## ------------------------------------------------- ##
dat %>%
glmer(group ~ riskfactor + fu + riskfactor:fu + (1|patient), family =
"binomial", data = .) %>%
summary
## ,----
## | Generalized linear mixed model fit by maximum likelihood (Laplace
## | Approximation) [glmerMod]
## | Family: binomial ( logit )
## | Formula: group ~ riskfactor + fu + riskfactor:fu + (1 | patient)
## | Data: .
## |
## | AIC BIC logLik deviance df.resid
## | 345.2 366.0 -167.6 335.2 470
## |
## | Scaled residuals:
## | Min 1Q Median 3Q Max
## | -0.095863 -0.058669 0.002278 0.002866 0.007324
## |
## | Random effects:
## | Groups Name Variance Std.Dev.
## | patient (Intercept) 1849 43
## | Number of obs: 475, groups: patient, 265
## |
## | Fixed effects:
## | Estimate Std. Error z value Pr(>|z|)
## | (Intercept) 11.6846 1.3736 8.507 <2e-16 ***
## | riskfactornorisk -1.5919 1.4166 -1.124 0.261
## | fuFU 0.4596 1.9165 0.240 0.810
## | riskfactornorisk:fuFU -0.8183 2.1651 -0.378 0.705
## | ---
## | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## |
## | Correlation of Fixed Effects:
## | (Intr) rskfct fuFU
## | rskfctrnrsk -0.746
## | fuFU -0.513 0.510
## | rskfctrn:FU 0.478 -0.576 -0.908
## `----
## I get huge variation in the random effects
##
## And the risk factor at BL gets an estimated log(odds ratio) of -1.6
## but one which is not significant
---------- cut here --------------------------------------------
On 26/11/18 12:10, Leha, Andreas wrote:
> Hi all,
>
> I am interested in assessing the association of a (potential) risk
> factor to a (binary) grouping.
>
> I am having trouble with diverging results from modeling one time point
> (without random effect) and modeling two time points (with random effect).
>
> When analysing the first time point (base line, BL) only, I get a highly
> significant association.
> Now, I want to see, whether there is an interaction between time and
> risk factor (the risk factor is not constant). But when analysing both
> time points, the estimated effect at BL is estimated to be not significant.
>
> Now my simplified questions are:
> (1) Is there an association at BL or not?
> (2) How should I analyse both time points with this data?
>
> The aim is to look for confounding with other factors. But I'd like to
> understand the simple models before moving on.
>
> Below you find a reproducible example and the detailed results.
>
> Any suggestions would be highly appreciated!
>
> Regards,
> Andreas
>
>
>
> PS: The code / results
>
> ---------- cut here --------------------------------------------
> library("dplyr")
> library("lme4")
> library("lmerTest")
> ## install_github("hrbrmstr/pastebin", upgrade_dependencies = FALSE)
> library("pastebin")
>
> ## ---------------------------------- ##
> ## load the data ##
> ## ---------------------------------- ##
> dat <- pastebin::get_paste("Xgwgtb7j") %>%
> as.character %>%
> gsub("\r\n", "", .) %>%
> parse(text = .) %>%
> eval
>
>
>
> ## ---------------------------------- ##
> ## have a look ##
> ## ---------------------------------- ##
> dat
> ## ,----
> ## | # A tibble: 475 x 4
> ## | patient group fu riskfactor
> ## | <fct> <fct> <fct> <fct>
> ## | 1 p001 wt BL norisk
> ## | 2 p002 wt BL norisk
> ## | 3 p003 wt BL norisk
> ## | 4 p004 wt BL norisk
> ## | 5 p005 wt BL norisk
> ## | 6 p006 wt BL norisk
> ## | 7 p007 wt BL norisk
> ## | 8 p008 wt BL norisk
> ## | 9 p009 wt BL risk
> ## | 10 p010 wt BL norisk
> ## | # ... with 465 more rows
> ## `----
> dat %>% str
> ## ,----
> ## | Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 475 obs. of 4 variables:
> ## | $ patient : Factor w/ 265 levels "p001","p002",..: 1 2 3 4 5 6 7
> 8 9 10 ...
> ## | $ group : Factor w/ 2 levels "wt","mut": 1 1 1 1 1 1 1 1 1 1 ...
> ## | $ fu : Factor w/ 2 levels "BL","FU": 1 1 1 1 1 1 1 1 1 1 ...
> ## | $ riskfactor: Factor w/ 2 levels "risk","norisk": 2 2 2 2 2 2 2 2
> 1 2 ...
> ## `----
>
> ## there are 265 patients
> ## in 2 groups: "wt" and "mut"
> ## with a dichotomous risk factor ("risk" and "norisk")
> ## measured at two time points ("BL" and "FU")
>
> dat %>% summary
> ## ,----
> ## | patient group fu riskfactor
> ## | p001 : 2 wt :209 BL:258 risk :205
> ## | p002 : 2 mut:266 FU:217 norisk:270
> ## | p003 : 2
> ## | p004 : 2
> ## | p005 : 2
> ## | p006 : 2
> ## | (Other):463
> ## `----
>
> ## group sizes seem fine
>
>
>
> ## ---------------------------------------------- ##
> ## first, we look at the first time point, the BL ##
> ## ---------------------------------------------- ##
>
> ## we build a cross table
> tab_bl <-
> dat %>%
> dplyr::select(group, riskfactor) %>%
> table
> tab_bl
> ## ,----
> ## | riskfactor
> ## | group risk norisk
> ## | wt 35 174
> ## | mut 170 96
> ## `----
>
> ## and we test using fisher:
> tab_bl %>% fisher.test
> ## ,----
> ## | Fisher's Exact Test for Count Data
> ## |
> ## | data: .
> ## | p-value < 2.2e-16
> ## | alternative hypothesis: true odds ratio is not equal to 1
> ## | 95 percent confidence interval:
> ## | 0.07099792 0.18002325
> ## | sample estimates:
> ## | odds ratio
> ## | 0.1141677
> ## `----
> log(0.114)
> ## ,----
> ## | [1] -2.171557
> ## `----
>
> ## so, we get a highly significant association of the riskfactor
> ## and the group with an log(odds ratio) of -2.2
>
> ## we get the same result using logistic regression:
> dat %>%
> glm(group ~ riskfactor, family = "binomial", data = .) %>%
> summary
> ## ,----
> ## |
> ## | Call:
> ## | glm(formula = group ~ riskfactor, family = "binomial", data = .)
> ## |
> ## | Deviance Residuals:
> ## | Min 1Q Median 3Q Max
> ## | -1.8802 -0.9374 0.6119 0.6119 1.4381
> ## |
> ## | Coefficients:
> ## | Estimate Std. Error z value Pr(>|z|)
> ## | (Intercept) 1.5805 0.1856 8.515 <2e-16 ***
> ## | riskfactornorisk -2.1752 0.2250 -9.668 <2e-16 ***
> ## | ---
> ## | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> ## |
> ## | (Dispersion parameter for binomial family taken to be 1)
> ## |
> ## | Null deviance: 651.63 on 474 degrees of freedom
> ## | Residual deviance: 538.83 on 473 degrees of freedom
> ## | AIC: 542.83
> ## |
> ## | Number of Fisher Scoring iterations: 4
> ## `----
>
>
>
> ## ------------------------------------------------- ##
> ## Now, we analyse both time points with interaction ##
> ## ------------------------------------------------- ##
>
> dat %>%
> glmer(group ~ riskfactor + fu + riskfactor:fu + (1|patient), family =
> "binomial", data = .) %>%
> summary
> ## ,----
> ## | Generalized linear mixed model fit by maximum likelihood (Laplace
> ## | Approximation) [glmerMod]
> ## | Family: binomial ( logit )
> ## | Formula: group ~ riskfactor + fu + riskfactor:fu + (1 | patient)
> ## | Data: .
> ## |
> ## | AIC BIC logLik deviance df.resid
> ## | 345.2 366.0 -167.6 335.2 470
> ## |
> ## | Scaled residuals:
> ## | Min 1Q Median 3Q Max
> ## | -0.095863 -0.058669 0.002278 0.002866 0.007324
> ## |
> ## | Random effects:
> ## | Groups Name Variance Std.Dev.
> ## | patient (Intercept) 1849 43
> ## | Number of obs: 475, groups: patient, 265
> ## |
> ## | Fixed effects:
> ## | Estimate Std. Error z value Pr(>|z|)
> ## | (Intercept) 11.6846 1.3736 8.507 <2e-16 ***
> ## | riskfactornorisk -1.5919 1.4166 -1.124 0.261
> ## | fuFU 0.4596 1.9165 0.240 0.810
> ## | riskfactornorisk:fuFU -0.8183 2.1651 -0.378 0.705
> ## | ---
> ## | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> ## |
> ## | Correlation of Fixed Effects:
> ## | (Intr) rskfct fuFU
> ## | rskfctrnrsk -0.746
> ## | fuFU -0.513 0.510
> ## | rskfctrn:FU 0.478 -0.576 -0.908
> ## `----
>
> ## I get huge variation in the random effects
> ##
> ## And the risk factor at BL gets an estimated log(odds ratio) of -1.6
> ## but one which is not significant
> ---------- cut here --------------------------------------------
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
--
Dr. Andreas Leha
Head of the 'Core Facility
Medical Biometry and Statistical Bioinformatics'
UNIVERSITY MEDICAL CENTER GÖTTINGEN
GEORG-AUGUST-UNIVERSITÄT
Department of Medical Statistics
Humboldtallee 32
37073 Göttingen
Mailing Address: 37099 Göttingen, Germany
Fax: +49 (0) 551 39-4995
Tel: +49 (0) 551 39-4987
http://www.ams.med.uni-goettingen.de/service-de.shtml
More information about the R-sig-mixed-models
mailing list