[R-sig-ME] diverging results with and without random effects
Leha, Andreas
@ndre@@@leh@ @ending from med@uni-goettingen@de
Mon Nov 26 17:00:53 CET 2018
Dear Thierry,
thanks for looking into this!
So, one solution would be a baysian analysis, right?
Would you have a recommendation for me?
I followed [1] and used
library("blme")
dat %>%
bglmer(group ~ riskfactor + fu + riskfactor:fu + (1|patient),
family = "binomial",
data = .,
fixef.prior = normal(cov = diag(9,4))) %>%
summary
Which runs and gives the following fixed effect estimates:
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 8.2598 0.7445 11.094 <2e-16 ***
riskfactornorisk -16.0942 1.3085 -12.300 <2e-16 ***
fuFU 1.0019 1.0047 0.997 0.319
riskfactornorisk:fuFU -1.8675 1.2365 -1.510 0.131
These still do not seem reasonable.
Thanks in advance!
Regards,
Andreas
[1]
https://stats.stackexchange.com/questions/132677/binomial-glmm-with-a-categorical-variable-with-full-successes/132678#132678
On 26/11/18 16:36, Thierry Onkelinx wrote:
> Dear Andreas,
>
> This is due to quasi complete separatation. This occurs when all
> responses for a specific combination of levels are always TRUE or FALSE.
> In your case, you have only two observations per patient. Hence adding
> the patient as random effect, guarantees quasi complete separation issues.
>
> Best regards,
>
> ir. Thierry Onkelinx
> Statisticus / Statistician
>
> Vlaamse Overheid / Government of Flanders
> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
> AND FOREST
> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
> thierry.onkelinx using inbo.be <mailto:thierry.onkelinx using inbo.be>
> Havenlaan 88 bus 73, 1000 Brussel
> www.inbo.be <http://www.inbo.be>
>
> ///////////////////////////////////////////////////////////////////////////////////////////
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to
> say what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of
> data. ~ John Tukey
> ///////////////////////////////////////////////////////////////////////////////////////////
>
> <https://www.inbo.be>
>
>
> Op ma 26 nov. 2018 om 13:48 schreef Leha, Andreas
> <andreas.leha using med.uni-goettingen.de
> <mailto:andreas.leha using med.uni-goettingen.de>>:
>
> Hi all,
>
> sent the wrong code (w/o filtering for BL). If you want to look at the
> data, please use this code:
>
> ---------- cut here --------------------------------------------
> library("dplyr")
> library("lme4")
> library("lmerTest")
> ## install_github("hrbrmstr/pastebin", upgrade_dependencies = FALSE)
> library("pastebin")
>
> ## ---------------------------------- ##
> ## load the data ##
> ## ---------------------------------- ##
> dat <- pastebin::get_paste("Xgwgtb7j") %>% as.character %>% gsub("\r\n",
> "", .) %>% parse(text = .) %>% eval
>
>
>
> ## ---------------------------------- ##
> ## have a look ##
> ## ---------------------------------- ##
> dat
> ## ,----
> ## | # A tibble: 475 x 4
> ## | patient group fu riskfactor
> ## | <fct> <fct> <fct> <fct>
> ## | 1 p001 wt BL norisk
> ## | 2 p002 wt BL norisk
> ## | 3 p003 wt BL norisk
> ## | 4 p004 wt BL norisk
> ## | 5 p005 wt BL norisk
> ## | 6 p006 wt BL norisk
> ## | 7 p007 wt BL norisk
> ## | 8 p008 wt BL norisk
> ## | 9 p009 wt BL risk
> ## | 10 p010 wt BL norisk
> ## | # ... with 465 more rows
> ## `----
> dat %>% str
> ## ,----
> ## | Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 475 obs. of 4
> variables:
> ## | $ patient : Factor w/ 265 levels "p001","p002",..: 1 2 3 4 5 6 7
> 8 9 10 ...
> ## | $ group : Factor w/ 2 levels "wt","mut": 1 1 1 1 1 1 1 1 1
> 1 ...
> ## | $ fu : Factor w/ 2 levels "BL","FU": 1 1 1 1 1 1 1 1 1
> 1 ...
> ## | $ riskfactor: Factor w/ 2 levels "risk","norisk": 2 2 2 2 2 2 2 2
> 1 2 ...
> ## `----
>
> ## there are 265 patients
> ## in 2 groups: "wt" and "mut"
> ## with a dichotomous risk factor ("risk" and "norisk")
> ## measured at two time points ("BL" and "FU")
>
> dat %>% summary
> ## ,----
> ## | patient group fu riskfactor
> ## | p001 : 2 wt :209 BL:258 risk :205
> ## | p002 : 2 mut:266 FU:217 norisk:270
> ## | p003 : 2
> ## | p004 : 2
> ## | p005 : 2
> ## | p006 : 2
> ## | (Other):463
> ## `----
>
> ## group sizes seem fine
>
>
>
> ## ---------------------------------------------- ##
> ## first, we look at the first time point, the BL ##
> ## ---------------------------------------------- ##
>
> ## we build a cross table
> tab_bl <-
> dat %>%
> dplyr::filter(fu == "BL") %>%
> dplyr::select(group, riskfactor) %>%
> table
> tab_bl
> ## ,----
> ## | riskfactor
> ## | group risk norisk
> ## | wt 22 86
> ## | mut 87 63
> ## `----
>
> ## and we test using fisher:
> tab_bl %>% fisher.test
> ## ,----
> ## | Fisher's Exact Test for Count Data
> ## |
> ## | data: .
> ## | p-value = 1.18e-09
> ## | alternative hypothesis: true odds ratio is not equal to 1
> ## | 95 percent confidence interval:
> ## | 0.09986548 0.33817966
> ## | sample estimates:
> ## | odds ratio
> ## | 0.1865377
> ## `----
> log(0.187)
> ## ,----
> ## | [1] -1.676647
> ## `----
>
> ## so, we get a highly significant association of the riskfactor
> ## and the group with an log(odds ratio) of -1.7
>
> ## we get the same result using logistic regression:
> dat %>%
> filter(fu == "BL") %>%
> glm(group ~ riskfactor, family = "binomial", data = .) %>%
> summary
> ## ,----
> ## | Call:
> ## | glm(formula = group ~ riskfactor, family = "binomial", data = .)
> ## |
> ## | Deviance Residuals:
> ## | Min 1Q Median 3Q Max
> ## | -1.7890 -1.0484 0.6715 0.6715 1.3121
> ## |
> ## | Coefficients:
> ## | Estimate Std. Error z value Pr(>|z|)
> ## | (Intercept) 1.3749 0.2386 5.761 8.35e-09 ***
> ## | riskfactornorisk -1.6861 0.2906 -5.802 6.55e-09 ***
> ## | ---
> ## | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> ## |
> ## | (Dispersion parameter for binomial family taken to be 1)
> ## |
> ## | Null deviance: 350.80 on 257 degrees of freedom
> ## | Residual deviance: 312.63 on 256 degrees of freedom
> ## | AIC: 316.63
> ## |
> ## | Number of Fisher Scoring iterations: 4
> ## `----
>
>
>
> ## ------------------------------------------------- ##
> ## Now, we analyse both time points with interaction ##
> ## ------------------------------------------------- ##
>
> dat %>%
> glmer(group ~ riskfactor + fu + riskfactor:fu + (1|patient), family =
> "binomial", data = .) %>%
> summary
> ## ,----
> ## | Generalized linear mixed model fit by maximum likelihood (Laplace
> ## | Approximation) [glmerMod]
> ## | Family: binomial ( logit )
> ## | Formula: group ~ riskfactor + fu + riskfactor:fu + (1 | patient)
> ## | Data: .
> ## |
> ## | AIC BIC logLik deviance df.resid
> ## | 345.2 366.0 -167.6 335.2 470
> ## |
> ## | Scaled residuals:
> ## | Min 1Q Median 3Q Max
> ## | -0.095863 -0.058669 0.002278 0.002866 0.007324
> ## |
> ## | Random effects:
> ## | Groups Name Variance Std.Dev.
> ## | patient (Intercept) 1849 43
> ## | Number of obs: 475, groups: patient, 265
> ## |
> ## | Fixed effects:
> ## | Estimate Std. Error z value Pr(>|z|)
> ## | (Intercept) 11.6846 1.3736 8.507 <2e-16 ***
> ## | riskfactornorisk -1.5919 1.4166 -1.124 0.261
> ## | fuFU 0.4596 1.9165 0.240 0.810
> ## | riskfactornorisk:fuFU -0.8183 2.1651 -0.378 0.705
> ## | ---
> ## | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> ## |
> ## | Correlation of Fixed Effects:
> ## | (Intr) rskfct fuFU
> ## | rskfctrnrsk -0.746
> ## | fuFU -0.513 0.510
> ## | rskfctrn:FU 0.478 -0.576 -0.908
> ## `----
>
> ## I get huge variation in the random effects
> ##
> ## And the risk factor at BL gets an estimated log(odds ratio) of -1.6
> ## but one which is not significant
> ---------- cut here --------------------------------------------
>
>
> On 26/11/18 12:10, Leha, Andreas wrote:
> > Hi all,
> >
> > I am interested in assessing the association of a (potential) risk
> > factor to a (binary) grouping.
> >
> > I am having trouble with diverging results from modeling one time
> point
> > (without random effect) and modeling two time points (with random
> effect).
> >
> > When analysing the first time point (base line, BL) only, I get a
> highly
> > significant association.
> > Now, I want to see, whether there is an interaction between time and
> > risk factor (the risk factor is not constant). But when analysing
> both
> > time points, the estimated effect at BL is estimated to be not
> significant.
> >
> > Now my simplified questions are:
> > (1) Is there an association at BL or not?
> > (2) How should I analyse both time points with this data?
> >
> > The aim is to look for confounding with other factors. But I'd
> like to
> > understand the simple models before moving on.
> >
> > Below you find a reproducible example and the detailed results.
> >
> > Any suggestions would be highly appreciated!
> >
> > Regards,
> > Andreas
> >
> >
> >
> > PS: The code / results
> >
> > ---------- cut here --------------------------------------------
> > library("dplyr")
> > library("lme4")
> > library("lmerTest")
> > ## install_github("hrbrmstr/pastebin", upgrade_dependencies = FALSE)
> > library("pastebin")
> >
> > ## ---------------------------------- ##
> > ## load the data ##
> > ## ---------------------------------- ##
> > dat <- pastebin::get_paste("Xgwgtb7j") %>%
> > as.character %>%
> > gsub("\r\n", "", .) %>%
> > parse(text = .) %>%
> > eval
> >
> >
> >
> > ## ---------------------------------- ##
> > ## have a look ##
> > ## ---------------------------------- ##
> > dat
> > ## ,----
> > ## | # A tibble: 475 x 4
> > ## | patient group fu riskfactor
> > ## | <fct> <fct> <fct> <fct>
> > ## | 1 p001 wt BL norisk
> > ## | 2 p002 wt BL norisk
> > ## | 3 p003 wt BL norisk
> > ## | 4 p004 wt BL norisk
> > ## | 5 p005 wt BL norisk
> > ## | 6 p006 wt BL norisk
> > ## | 7 p007 wt BL norisk
> > ## | 8 p008 wt BL norisk
> > ## | 9 p009 wt BL risk
> > ## | 10 p010 wt BL norisk
> > ## | # ... with 465 more rows
> > ## `----
> > dat %>% str
> > ## ,----
> > ## | Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 475 obs. of
> 4 variables:
> > ## | $ patient : Factor w/ 265 levels "p001","p002",..: 1 2 3 4
> 5 6 7
> > 8 9 10 ...
> > ## | $ group : Factor w/ 2 levels "wt","mut": 1 1 1 1 1 1 1 1
> 1 1 ...
> > ## | $ fu : Factor w/ 2 levels "BL","FU": 1 1 1 1 1 1 1 1
> 1 1 ...
> > ## | $ riskfactor: Factor w/ 2 levels "risk","norisk": 2 2 2 2 2
> 2 2 2
> > 1 2 ...
> > ## `----
> >
> > ## there are 265 patients
> > ## in 2 groups: "wt" and "mut"
> > ## with a dichotomous risk factor ("risk" and "norisk")
> > ## measured at two time points ("BL" and "FU")
> >
> > dat %>% summary
> > ## ,----
> > ## | patient group fu riskfactor
> > ## | p001 : 2 wt :209 BL:258 risk :205
> > ## | p002 : 2 mut:266 FU:217 norisk:270
> > ## | p003 : 2
> > ## | p004 : 2
> > ## | p005 : 2
> > ## | p006 : 2
> > ## | (Other):463
> > ## `----
> >
> > ## group sizes seem fine
> >
> >
> >
> > ## ---------------------------------------------- ##
> > ## first, we look at the first time point, the BL ##
> > ## ---------------------------------------------- ##
> >
> > ## we build a cross table
> > tab_bl <-
> > dat %>%
> > dplyr::select(group, riskfactor) %>%
> > table
> > tab_bl
> > ## ,----
> > ## | riskfactor
> > ## | group risk norisk
> > ## | wt 35 174
> > ## | mut 170 96
> > ## `----
> >
> > ## and we test using fisher:
> > tab_bl %>% fisher.test
> > ## ,----
> > ## | Fisher's Exact Test for Count Data
> > ## |
> > ## | data: .
> > ## | p-value < 2.2e-16
> > ## | alternative hypothesis: true odds ratio is not equal to 1
> > ## | 95 percent confidence interval:
> > ## | 0.07099792 0.18002325
> > ## | sample estimates:
> > ## | odds ratio
> > ## | 0.1141677
> > ## `----
> > log(0.114)
> > ## ,----
> > ## | [1] -2.171557
> > ## `----
> >
> > ## so, we get a highly significant association of the riskfactor
> > ## and the group with an log(odds ratio) of -2.2
> >
> > ## we get the same result using logistic regression:
> > dat %>%
> > glm(group ~ riskfactor, family = "binomial", data = .) %>%
> > summary
> > ## ,----
> > ## |
> > ## | Call:
> > ## | glm(formula = group ~ riskfactor, family = "binomial", data = .)
> > ## |
> > ## | Deviance Residuals:
> > ## | Min 1Q Median 3Q Max
> > ## | -1.8802 -0.9374 0.6119 0.6119 1.4381
> > ## |
> > ## | Coefficients:
> > ## | Estimate Std. Error z value Pr(>|z|)
> > ## | (Intercept) 1.5805 0.1856 8.515 <2e-16 ***
> > ## | riskfactornorisk -2.1752 0.2250 -9.668 <2e-16 ***
> > ## | ---
> > ## | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> > ## |
> > ## | (Dispersion parameter for binomial family taken to be 1)
> > ## |
> > ## | Null deviance: 651.63 on 474 degrees of freedom
> > ## | Residual deviance: 538.83 on 473 degrees of freedom
> > ## | AIC: 542.83
> > ## |
> > ## | Number of Fisher Scoring iterations: 4
> > ## `----
> >
> >
> >
> > ## ------------------------------------------------- ##
> > ## Now, we analyse both time points with interaction ##
> > ## ------------------------------------------------- ##
> >
> > dat %>%
> > glmer(group ~ riskfactor + fu + riskfactor:fu + (1|patient),
> family =
> > "binomial", data = .) %>%
> > summary
> > ## ,----
> > ## | Generalized linear mixed model fit by maximum likelihood (Laplace
> > ## | Approximation) [glmerMod]
> > ## | Family: binomial ( logit )
> > ## | Formula: group ~ riskfactor + fu + riskfactor:fu + (1 | patient)
> > ## | Data: .
> > ## |
> > ## | AIC BIC logLik deviance df.resid
> > ## | 345.2 366.0 -167.6 335.2 470
> > ## |
> > ## | Scaled residuals:
> > ## | Min 1Q Median 3Q Max
> > ## | -0.095863 -0.058669 0.002278 0.002866 0.007324
> > ## |
> > ## | Random effects:
> > ## | Groups Name Variance Std.Dev.
> > ## | patient (Intercept) 1849 43
> > ## | Number of obs: 475, groups: patient, 265
> > ## |
> > ## | Fixed effects:
> > ## | Estimate Std. Error z value Pr(>|z|)
> > ## | (Intercept) 11.6846 1.3736 8.507 <2e-16 ***
> > ## | riskfactornorisk -1.5919 1.4166 -1.124 0.261
> > ## | fuFU 0.4596 1.9165 0.240 0.810
> > ## | riskfactornorisk:fuFU -0.8183 2.1651 -0.378 0.705
> > ## | ---
> > ## | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> > ## |
> > ## | Correlation of Fixed Effects:
> > ## | (Intr) rskfct fuFU
> > ## | rskfctrnrsk -0.746
> > ## | fuFU -0.513 0.510
> > ## | rskfctrn:FU 0.478 -0.576 -0.908
> > ## `----
> >
> > ## I get huge variation in the random effects
> > ##
> > ## And the risk factor at BL gets an estimated log(odds ratio) of -1.6
> > ## but one which is not significant
> > ---------- cut here --------------------------------------------
> > _______________________________________________
> > R-sig-mixed-models using r-project.org
> <mailto:R-sig-mixed-models using r-project.org> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
>
> --
> Dr. Andreas Leha
> Head of the 'Core Facility
> Medical Biometry and Statistical Bioinformatics'
>
> UNIVERSITY MEDICAL CENTER GÖTTINGEN
> GEORG-AUGUST-UNIVERSITÄT
> Department of Medical Statistics
> Humboldtallee 32
> 37073 Göttingen
> Mailing Address: 37099 Göttingen, Germany
> Fax: +49 (0) 551 39-4995
> Tel: +49 (0) 551 39-4987
> http://www.ams.med.uni-goettingen.de/service-de.shtml
> _______________________________________________
> R-sig-mixed-models using r-project.org
> <mailto:R-sig-mixed-models using r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
--
Dr. Andreas Leha
Head of the 'Core Facility
Medical Biometry and Statistical Bioinformatics'
UNIVERSITY MEDICAL CENTER GÖTTINGEN
GEORG-AUGUST-UNIVERSITÄT
Department of Medical Statistics
Humboldtallee 32
37073 Göttingen
Mailing Address: 37099 Göttingen, Germany
Fax: +49 (0) 551 39-4995
Tel: +49 (0) 551 39-4987
http://www.ams.med.uni-goettingen.de/service-de.shtml
More information about the R-sig-mixed-models
mailing list