[R-sig-ME] diverging results with and without random effects

Mon Nov 26 17:00:53 CET 2018

Dear Thierry,

thanks for looking into this!

So, one solution would be a baysian analysis, right?

Would you have a recommendation for me?

I followed [1] and used

  library("blme")
  dat %>%
    bglmer(group ~ riskfactor + fu + riskfactor:fu + (1|patient),
           family = "binomial",
           data = .,
           fixef.prior = normal(cov = diag(9,4))) %>%
    summary

Which runs and gives the following fixed effect estimates:

  Fixed effects:
                        Estimate Std. Error z value Pr(>|z|)
  (Intercept)             8.2598     0.7445  11.094   <2e-16 ***
  riskfactornorisk      -16.0942     1.3085 -12.300   <2e-16 ***
  fuFU                    1.0019     1.0047   0.997    0.319
  riskfactornorisk:fuFU  -1.8675     1.2365  -1.510    0.131

These still do not seem reasonable.

Thanks in advance!

Regards,
Andreas

[1]
https://stats.stackexchange.com/questions/132677/binomial-glmm-with-a-categorical-variable-with-full-successes/132678#132678

On 26/11/18 16:36, Thierry Onkelinx wrote:
> Dear Andreas,
> 
> This is due to quasi complete separatation. This occurs when all
> responses for a specific combination of levels are always TRUE or FALSE.
> In your case, you have only two observations per patient. Hence adding
> the patient as random effect, guarantees quasi complete separation issues.  
> 
> Best regards,
> 
> ir. Thierry Onkelinx
> Statisticus / Statistician
> 
> Vlaamse Overheid / Government of Flanders
> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
> AND FOREST
> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
> thierry.onkelinx using inbo.be <mailto:thierry.onkelinx using inbo.be>
> Havenlaan 88 bus 73, 1000 Brussel
> www.inbo.be <http://www.inbo.be>
> 
> ///////////////////////////////////////////////////////////////////////////////////////////
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to
> say what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of
> data. ~ John Tukey
> ///////////////////////////////////////////////////////////////////////////////////////////
> 
> <https://www.inbo.be>
> 
> 
> Op ma 26 nov. 2018 om 13:48 schreef Leha, Andreas
> <andreas.leha using med.uni-goettingen.de
> <mailto:andreas.leha using med.uni-goettingen.de>>:
> 
>     Hi all,
> 
>     sent the wrong code (w/o filtering for BL).  If you want to look at the
>     data, please use this code:
> 
>     ---------- cut here --------------------------------------------
>     library("dplyr")
>     library("lme4")
>     library("lmerTest")
>     ## install_github("hrbrmstr/pastebin", upgrade_dependencies = FALSE)
>     library("pastebin")
> 
>     ## ---------------------------------- ##
>     ## load the data                      ##
>     ## ---------------------------------- ##
>     dat <- pastebin::get_paste("Xgwgtb7j") %>% as.character %>% gsub("\r\n",
>     "", .) %>% parse(text = .) %>% eval
> 
> 
> 
>     ## ---------------------------------- ##
>     ## have a look                        ##
>     ## ---------------------------------- ##
>     dat
>     ## ,----
>     ## | # A tibble: 475 x 4
>     ## |    patient group fu    riskfactor
>     ## |    <fct>   <fct> <fct> <fct>
>     ## |  1 p001    wt    BL    norisk
>     ## |  2 p002    wt    BL    norisk
>     ## |  3 p003    wt    BL    norisk
>     ## |  4 p004    wt    BL    norisk
>     ## |  5 p005    wt    BL    norisk
>     ## |  6 p006    wt    BL    norisk
>     ## |  7 p007    wt    BL    norisk
>     ## |  8 p008    wt    BL    norisk
>     ## |  9 p009    wt    BL    risk
>     ## | 10 p010    wt    BL    norisk
>     ## | # ... with 465 more rows
>     ## `----
>     dat %>% str
>     ## ,----
>     ## | Classes ‘tbl_df’, ‘tbl’ and 'data.frame':  475 obs. of  4
>     variables:
>     ## |  $ patient   : Factor w/ 265 levels "p001","p002",..: 1 2 3 4 5 6 7
>     8 9 10 ...
>     ## |  $ group     : Factor w/ 2 levels "wt","mut": 1 1 1 1 1 1 1 1 1
>     1 ...
>     ## |  $ fu        : Factor w/ 2 levels "BL","FU": 1 1 1 1 1 1 1 1 1
>     1 ...
>     ## |  $ riskfactor: Factor w/ 2 levels "risk","norisk": 2 2 2 2 2 2 2 2
>     1 2 ...
>     ## `----
> 
>     ## there are 265 patients
>     ## in 2 groups: "wt" and "mut"
>     ## with a dichotomous risk factor ("risk" and "norisk")
>     ## measured at two time points ("BL" and "FU")
> 
>     dat %>% summary
>     ## ,----
>     ## |     patient    group      fu       riskfactor
>     ## |  p001   :  2   wt :209   BL:258   risk  :205
>     ## |  p002   :  2   mut:266   FU:217   norisk:270
>     ## |  p003   :  2
>     ## |  p004   :  2
>     ## |  p005   :  2
>     ## |  p006   :  2
>     ## |  (Other):463
>     ## `----
> 
>     ## group sizes seem fine
> 
> 
> 
>     ## ---------------------------------------------- ##
>     ## first, we look at the first time point, the BL ##
>     ## ---------------------------------------------- ##
> 
>     ## we build a cross table
>     tab_bl <-
>       dat %>%
>       dplyr::filter(fu == "BL") %>%
>       dplyr::select(group, riskfactor) %>%
>       table
>     tab_bl
>     ## ,----
>     ## |      riskfactor
>     ## | group risk norisk
>     ## |   wt    22     86
>     ## |   mut   87     63
>     ## `----
> 
>     ## and we test using fisher:
>     tab_bl %>% fisher.test
>     ## ,----
>     ## |    Fisher's Exact Test for Count Data
>     ## |
>     ## | data:  .
>     ## | p-value = 1.18e-09
>     ## | alternative hypothesis: true odds ratio is not equal to 1
>     ## | 95 percent confidence interval:
>     ## |  0.09986548 0.33817966
>     ## | sample estimates:
>     ## | odds ratio
>     ## |  0.1865377
>     ## `----
>     log(0.187)
>     ## ,----
>     ## | [1] -1.676647
>     ## `----
> 
>     ## so, we get a highly significant association of the riskfactor
>     ## and the group with an log(odds ratio) of -1.7
> 
>     ## we get the same result using logistic regression:
>     dat %>%
>       filter(fu == "BL") %>%
>       glm(group ~ riskfactor, family = "binomial", data = .) %>%
>       summary
>     ## ,----
>     ## | Call:
>     ## | glm(formula = group ~ riskfactor, family = "binomial", data = .)
>     ## |
>     ## | Deviance Residuals:
>     ## |     Min       1Q   Median       3Q      Max
>     ## | -1.7890  -1.0484   0.6715   0.6715   1.3121
>     ## |
>     ## | Coefficients:
>     ## |                  Estimate Std. Error z value Pr(>|z|)
>     ## | (Intercept)        1.3749     0.2386   5.761 8.35e-09 ***
>     ## | riskfactornorisk  -1.6861     0.2906  -5.802 6.55e-09 ***
>     ## | ---
>     ## | Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>     ## |
>     ## | (Dispersion parameter for binomial family taken to be 1)
>     ## |
>     ## |     Null deviance: 350.80  on 257  degrees of freedom
>     ## | Residual deviance: 312.63  on 256  degrees of freedom
>     ## | AIC: 316.63
>     ## |
>     ## | Number of Fisher Scoring iterations: 4
>     ## `----
> 
> 
> 
>     ## ------------------------------------------------- ##
>     ## Now, we analyse both time points with interaction ##
>     ## ------------------------------------------------- ##
> 
>     dat %>%
>       glmer(group ~ riskfactor + fu + riskfactor:fu + (1|patient), family =
>     "binomial", data = .) %>%
>       summary
>     ## ,----
>     ## | Generalized linear mixed model fit by maximum likelihood (Laplace
>     ## |   Approximation) [glmerMod]
>     ## |  Family: binomial  ( logit )
>     ## | Formula: group ~ riskfactor + fu + riskfactor:fu + (1 | patient)
>     ## |    Data: .
>     ## |
>     ## |      AIC      BIC   logLik deviance df.resid
>     ## |    345.2    366.0   -167.6    335.2      470
>     ## |
>     ## | Scaled residuals:
>     ## |       Min        1Q    Median        3Q       Max
>     ## | -0.095863 -0.058669  0.002278  0.002866  0.007324
>     ## |
>     ## | Random effects:
>     ## |  Groups  Name        Variance Std.Dev.
>     ## |  patient (Intercept) 1849     43
>     ## | Number of obs: 475, groups:  patient, 265
>     ## |
>     ## | Fixed effects:
>     ## |                       Estimate Std. Error z value Pr(>|z|)
>     ## | (Intercept)            11.6846     1.3736   8.507   <2e-16 ***
>     ## | riskfactornorisk       -1.5919     1.4166  -1.124    0.261
>     ## | fuFU                    0.4596     1.9165   0.240    0.810
>     ## | riskfactornorisk:fuFU  -0.8183     2.1651  -0.378    0.705
>     ## | ---
>     ## | Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>     ## |
>     ## | Correlation of Fixed Effects:
>     ## |             (Intr) rskfct fuFU
>     ## | rskfctrnrsk -0.746
>     ## | fuFU        -0.513  0.510
>     ## | rskfctrn:FU  0.478 -0.576 -0.908
>     ## `----
> 
>     ## I get huge variation in the random effects
>     ##
>     ## And the risk factor at BL gets an estimated log(odds ratio) of -1.6
>     ## but one which is not significant
>     ---------- cut here --------------------------------------------
> 
> 
>     On 26/11/18 12:10, Leha, Andreas wrote:
>     > Hi all,
>     >
>     > I am interested in assessing the association of a (potential) risk
>     > factor to a (binary) grouping.
>     >
>     > I am having trouble with diverging results from modeling one time
>     point
>     > (without random effect) and modeling two time points (with random
>     effect).
>     >
>     > When analysing the first time point (base line, BL) only, I get a
>     highly
>     > significant association.
>     > Now, I want to see, whether there is an interaction between time and
>     > risk factor (the risk factor is not constant).  But when analysing
>     both
>     > time points, the estimated effect at BL is estimated to be not
>     significant.
>     >
>     > Now my simplified questions are:
>     > (1) Is there an association at BL or not?
>     > (2) How should I analyse both time points with this data?
>     >
>     > The aim is to look for confounding with other factors.  But I'd
>     like to
>     > understand the simple models before moving on.
>     >
>     > Below you find a reproducible example and the detailed results.
>     >
>     > Any suggestions would be highly appreciated!
>     >
>     > Regards,
>     > Andreas
>     >
>     >
>     >
>     > PS: The code / results
>     >
>     > ---------- cut here --------------------------------------------
>     > library("dplyr")
>     > library("lme4")
>     > library("lmerTest")
>     > ## install_github("hrbrmstr/pastebin", upgrade_dependencies = FALSE)
>     > library("pastebin")
>     >
>     > ## ---------------------------------- ##
>     > ## load the data                      ##
>     > ## ---------------------------------- ##
>     > dat <- pastebin::get_paste("Xgwgtb7j") %>%
>     >   as.character %>%
>     >   gsub("\r\n", "", .) %>%
>     >   parse(text = .) %>%
>     >   eval
>     >
>     >
>     >
>     > ## ---------------------------------- ##
>     > ## have a look                        ##
>     > ## ---------------------------------- ##
>     > dat
>     > ## ,----
>     > ## | # A tibble: 475 x 4
>     > ## |    patient group fu    riskfactor
>     > ## |    <fct>   <fct> <fct> <fct>
>     > ## |  1 p001    wt    BL    norisk
>     > ## |  2 p002    wt    BL    norisk
>     > ## |  3 p003    wt    BL    norisk
>     > ## |  4 p004    wt    BL    norisk
>     > ## |  5 p005    wt    BL    norisk
>     > ## |  6 p006    wt    BL    norisk
>     > ## |  7 p007    wt    BL    norisk
>     > ## |  8 p008    wt    BL    norisk
>     > ## |  9 p009    wt    BL    risk
>     > ## | 10 p010    wt    BL    norisk
>     > ## | # ... with 465 more rows
>     > ## `----
>     > dat %>% str
>     > ## ,----
>     > ## | Classes ‘tbl_df’, ‘tbl’ and 'data.frame':        475 obs. of 
>     4 variables:
>     > ## |  $ patient   : Factor w/ 265 levels "p001","p002",..: 1 2 3 4
>     5 6 7
>     > 8 9 10 ...
>     > ## |  $ group     : Factor w/ 2 levels "wt","mut": 1 1 1 1 1 1 1 1
>     1 1 ...
>     > ## |  $ fu        : Factor w/ 2 levels "BL","FU": 1 1 1 1 1 1 1 1
>     1 1 ...
>     > ## |  $ riskfactor: Factor w/ 2 levels "risk","norisk": 2 2 2 2 2
>     2 2 2
>     > 1 2 ...
>     > ## `----
>     >
>     > ## there are 265 patients
>     > ## in 2 groups: "wt" and "mut"
>     > ## with a dichotomous risk factor ("risk" and "norisk")
>     > ## measured at two time points ("BL" and "FU")
>     >
>     > dat %>% summary
>     > ## ,----
>     > ## |     patient    group      fu       riskfactor
>     > ## |  p001   :  2   wt :209   BL:258   risk  :205
>     > ## |  p002   :  2   mut:266   FU:217   norisk:270
>     > ## |  p003   :  2
>     > ## |  p004   :  2
>     > ## |  p005   :  2
>     > ## |  p006   :  2
>     > ## |  (Other):463
>     > ## `----
>     >
>     > ## group sizes seem fine
>     >
>     >
>     >
>     > ## ---------------------------------------------- ##
>     > ## first, we look at the first time point, the BL ##
>     > ## ---------------------------------------------- ##
>     >
>     > ## we build a cross table
>     > tab_bl <-
>     >   dat %>%
>     >   dplyr::select(group, riskfactor) %>%
>     >   table
>     > tab_bl
>     > ## ,----
>     > ## |      riskfactor
>     > ## | group risk norisk
>     > ## |   wt    35    174
>     > ## |   mut  170     96
>     > ## `----
>     >
>     > ## and we test using fisher:
>     > tab_bl %>% fisher.test
>     > ## ,----
>     > ## |    Fisher's Exact Test for Count Data
>     > ## |
>     > ## | data:  .
>     > ## | p-value < 2.2e-16
>     > ## | alternative hypothesis: true odds ratio is not equal to 1
>     > ## | 95 percent confidence interval:
>     > ## |  0.07099792 0.18002325
>     > ## | sample estimates:
>     > ## | odds ratio
>     > ## |  0.1141677
>     > ## `----
>     > log(0.114)
>     > ## ,----
>     > ## | [1] -2.171557
>     > ## `----
>     >
>     > ## so, we get a highly significant association of the riskfactor
>     > ## and the group with an log(odds ratio) of -2.2
>     >
>     > ## we get the same result using logistic regression:
>     > dat %>%
>     >   glm(group ~ riskfactor, family = "binomial", data = .) %>%
>     >   summary
>     > ## ,----
>     > ## |
>     > ## | Call:
>     > ## | glm(formula = group ~ riskfactor, family = "binomial", data = .)
>     > ## |
>     > ## | Deviance Residuals:
>     > ## |     Min       1Q   Median       3Q      Max
>     > ## | -1.8802  -0.9374   0.6119   0.6119   1.4381
>     > ## |
>     > ## | Coefficients:
>     > ## |                  Estimate Std. Error z value Pr(>|z|)
>     > ## | (Intercept)        1.5805     0.1856   8.515   <2e-16 ***
>     > ## | riskfactornorisk  -2.1752     0.2250  -9.668   <2e-16 ***
>     > ## | ---
>     > ## | Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>     > ## |
>     > ## | (Dispersion parameter for binomial family taken to be 1)
>     > ## |
>     > ## |     Null deviance: 651.63  on 474  degrees of freedom
>     > ## | Residual deviance: 538.83  on 473  degrees of freedom
>     > ## | AIC: 542.83
>     > ## |
>     > ## | Number of Fisher Scoring iterations: 4
>     > ## `----
>     >
>     >
>     >
>     > ## ------------------------------------------------- ##
>     > ## Now, we analyse both time points with interaction ##
>     > ## ------------------------------------------------- ##
>     >
>     > dat %>%
>     >   glmer(group ~ riskfactor + fu + riskfactor:fu + (1|patient),
>     family =
>     > "binomial", data = .) %>%
>     >   summary
>     > ## ,----
>     > ## | Generalized linear mixed model fit by maximum likelihood (Laplace
>     > ## |   Approximation) [glmerMod]
>     > ## |  Family: binomial  ( logit )
>     > ## | Formula: group ~ riskfactor + fu + riskfactor:fu + (1 | patient)
>     > ## |    Data: .
>     > ## |
>     > ## |      AIC      BIC   logLik deviance df.resid
>     > ## |    345.2    366.0   -167.6    335.2      470
>     > ## |
>     > ## | Scaled residuals:
>     > ## |       Min        1Q    Median        3Q       Max
>     > ## | -0.095863 -0.058669  0.002278  0.002866  0.007324
>     > ## |
>     > ## | Random effects:
>     > ## |  Groups  Name        Variance Std.Dev.
>     > ## |  patient (Intercept) 1849     43
>     > ## | Number of obs: 475, groups:  patient, 265
>     > ## |
>     > ## | Fixed effects:
>     > ## |                       Estimate Std. Error z value Pr(>|z|)
>     > ## | (Intercept)            11.6846     1.3736   8.507   <2e-16 ***
>     > ## | riskfactornorisk       -1.5919     1.4166  -1.124    0.261
>     > ## | fuFU                    0.4596     1.9165   0.240    0.810
>     > ## | riskfactornorisk:fuFU  -0.8183     2.1651  -0.378    0.705
>     > ## | ---
>     > ## | Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>     > ## |
>     > ## | Correlation of Fixed Effects:
>     > ## |             (Intr) rskfct fuFU
>     > ## | rskfctrnrsk -0.746
>     > ## | fuFU        -0.513  0.510
>     > ## | rskfctrn:FU  0.478 -0.576 -0.908
>     > ## `----
>     >
>     > ## I get huge variation in the random effects
>     > ##
>     > ## And the risk factor at BL gets an estimated log(odds ratio) of -1.6
>     > ## but one which is not significant
>     > ---------- cut here --------------------------------------------
>     > _______________________________________________
>     > R-sig-mixed-models using r-project.org
>     <mailto:R-sig-mixed-models using r-project.org> mailing list
>     > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>     >
> 
>     -- 
>     Dr. Andreas Leha
>     Head of the 'Core Facility
>     Medical Biometry and Statistical Bioinformatics'
> 
>     UNIVERSITY MEDICAL CENTER GÖTTINGEN
>     GEORG-AUGUST-UNIVERSITÄT
>     Department of Medical Statistics
>     Humboldtallee 32
>     37073 Göttingen
>     Mailing Address: 37099 Göttingen, Germany
>     Fax: +49 (0) 551 39-4995
>     Tel: +49 (0) 551 39-4987
>     http://www.ams.med.uni-goettingen.de/service-de.shtml
>     _______________________________________________
>     R-sig-mixed-models using r-project.org
>     <mailto:R-sig-mixed-models using r-project.org> mailing list
>     https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 

-- 
Dr. Andreas Leha
Head of the 'Core Facility
Medical Biometry and Statistical Bioinformatics'

UNIVERSITY MEDICAL CENTER GÖTTINGEN
GEORG-AUGUST-UNIVERSITÄT
Department of Medical Statistics
Humboldtallee 32
37073 Göttingen
Mailing Address: 37099 Göttingen, Germany
Fax: +49 (0) 551 39-4995
Tel: +49 (0) 551 39-4987
http://www.ams.med.uni-goettingen.de/service-de.shtml