[R-sig-ME] diverging results with and without random effects

Mon Nov 26 17:05:43 CET 2018

Dear Andreas,

You'll need a very informative prior for the random intercept variance in
order to keep the random intercepts reasonable small.

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx using inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

<https://www.inbo.be>

Op ma 26 nov. 2018 om 17:00 schreef Leha, Andreas <
andreas.leha using med.uni-goettingen.de>:

> Dear Thierry,
>
> thanks for looking into this!
>
> So, one solution would be a baysian analysis, right?
>
> Would you have a recommendation for me?
>
> I followed [1] and used
>
>   library("blme")
>   dat %>%
>     bglmer(group ~ riskfactor + fu + riskfactor:fu + (1|patient),
>            family = "binomial",
>            data = .,
>            fixef.prior = normal(cov = diag(9,4))) %>%
>     summary
>
> Which runs and gives the following fixed effect estimates:
>
>
>   Fixed effects:
>                         Estimate Std. Error z value Pr(>|z|)
>   (Intercept)             8.2598     0.7445  11.094   <2e-16 ***
>   riskfactornorisk      -16.0942     1.3085 -12.300   <2e-16 ***
>   fuFU                    1.0019     1.0047   0.997    0.319
>   riskfactornorisk:fuFU  -1.8675     1.2365  -1.510    0.131
>
>
> These still do not seem reasonable.
>
> Thanks in advance!
>
> Regards,
> Andreas
>
>
> [1]
>
> https://stats.stackexchange.com/questions/132677/binomial-glmm-with-a-categorical-variable-with-full-successes/132678#132678
>
>
> On 26/11/18 16:36, Thierry Onkelinx wrote:
> > Dear Andreas,
> >
> > This is due to quasi complete separatation. This occurs when all
> > responses for a specific combination of levels are always TRUE or FALSE.
> > In your case, you have only two observations per patient. Hence adding
> > the patient as random effect, guarantees quasi complete separation
> issues.
> >
> > Best regards,
> >
> > ir. Thierry Onkelinx
> > Statisticus / Statistician
> >
> > Vlaamse Overheid / Government of Flanders
> > INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
> > AND FOREST
> > Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
> > thierry.onkelinx using inbo.be <mailto:thierry.onkelinx using inbo.be>
> > Havenlaan 88 bus 73, 1000 Brussel
> > www.inbo.be <http://www.inbo.be>
> >
> >
> ///////////////////////////////////////////////////////////////////////////////////////////
> > To call in the statistician after the experiment is done may be no more
> > than asking him to perform a post-mortem examination: he may be able to
> > say what the experiment died of. ~ Sir Ronald Aylmer Fisher
> > The plural of anecdote is not data. ~ Roger Brinner
> > The combination of some data and an aching desire for an answer does not
> > ensure that a reasonable answer can be extracted from a given body of
> > data. ~ John Tukey
> >
> ///////////////////////////////////////////////////////////////////////////////////////////
> >
> > <https://www.inbo.be>
> >
> >
> > Op ma 26 nov. 2018 om 13:48 schreef Leha, Andreas
> > <andreas.leha using med.uni-goettingen.de
> > <mailto:andreas.leha using med.uni-goettingen.de>>:
> >
> >     Hi all,
> >
> >     sent the wrong code (w/o filtering for BL).  If you want to look at
> the
> >     data, please use this code:
> >
> >     ---------- cut here --------------------------------------------
> >     library("dplyr")
> >     library("lme4")
> >     library("lmerTest")
> >     ## install_github("hrbrmstr/pastebin", upgrade_dependencies = FALSE)
> >     library("pastebin")
> >
> >     ## ---------------------------------- ##
> >     ## load the data                      ##
> >     ## ---------------------------------- ##
> >     dat <- pastebin::get_paste("Xgwgtb7j") %>% as.character %>%
> gsub("\r\n",
> >     "", .) %>% parse(text = .) %>% eval
> >
> >
> >
> >     ## ---------------------------------- ##
> >     ## have a look                        ##
> >     ## ---------------------------------- ##
> >     dat
> >     ## ,----
> >     ## | # A tibble: 475 x 4
> >     ## |    patient group fu    riskfactor
> >     ## |    <fct>   <fct> <fct> <fct>
> >     ## |  1 p001    wt    BL    norisk
> >     ## |  2 p002    wt    BL    norisk
> >     ## |  3 p003    wt    BL    norisk
> >     ## |  4 p004    wt    BL    norisk
> >     ## |  5 p005    wt    BL    norisk
> >     ## |  6 p006    wt    BL    norisk
> >     ## |  7 p007    wt    BL    norisk
> >     ## |  8 p008    wt    BL    norisk
> >     ## |  9 p009    wt    BL    risk
> >     ## | 10 p010    wt    BL    norisk
> >     ## | # ... with 465 more rows
> >     ## `----
> >     dat %>% str
> >     ## ,----
> >     ## | Classes ‘tbl_df’, ‘tbl’ and 'data.frame':  475 obs. of  4
> >     variables:
> >     ## |  $ patient   : Factor w/ 265 levels "p001","p002",..: 1 2 3 4 5
> 6 7
> >     8 9 10 ...
> >     ## |  $ group     : Factor w/ 2 levels "wt","mut": 1 1 1 1 1 1 1 1 1
> >     1 ...
> >     ## |  $ fu        : Factor w/ 2 levels "BL","FU": 1 1 1 1 1 1 1 1 1
> >     1 ...
> >     ## |  $ riskfactor: Factor w/ 2 levels "risk","norisk": 2 2 2 2 2 2
> 2 2
> >     1 2 ...
> >     ## `----
> >
> >     ## there are 265 patients
> >     ## in 2 groups: "wt" and "mut"
> >     ## with a dichotomous risk factor ("risk" and "norisk")
> >     ## measured at two time points ("BL" and "FU")
> >
> >     dat %>% summary
> >     ## ,----
> >     ## |     patient    group      fu       riskfactor
> >     ## |  p001   :  2   wt :209   BL:258   risk  :205
> >     ## |  p002   :  2   mut:266   FU:217   norisk:270
> >     ## |  p003   :  2
> >     ## |  p004   :  2
> >     ## |  p005   :  2
> >     ## |  p006   :  2
> >     ## |  (Other):463
> >     ## `----
> >
> >     ## group sizes seem fine
> >
> >
> >
> >     ## ---------------------------------------------- ##
> >     ## first, we look at the first time point, the BL ##
> >     ## ---------------------------------------------- ##
> >
> >     ## we build a cross table
> >     tab_bl <-
> >       dat %>%
> >       dplyr::filter(fu == "BL") %>%
> >       dplyr::select(group, riskfactor) %>%
> >       table
> >     tab_bl
> >     ## ,----
> >     ## |      riskfactor
> >     ## | group risk norisk
> >     ## |   wt    22     86
> >     ## |   mut   87     63
> >     ## `----
> >
> >     ## and we test using fisher:
> >     tab_bl %>% fisher.test
> >     ## ,----
> >     ## |    Fisher's Exact Test for Count Data
> >     ## |
> >     ## | data:  .
> >     ## | p-value = 1.18e-09
> >     ## | alternative hypothesis: true odds ratio is not equal to 1
> >     ## | 95 percent confidence interval:
> >     ## |  0.09986548 0.33817966
> >     ## | sample estimates:
> >     ## | odds ratio
> >     ## |  0.1865377
> >     ## `----
> >     log(0.187)
> >     ## ,----
> >     ## | [1] -1.676647
> >     ## `----
> >
> >     ## so, we get a highly significant association of the riskfactor
> >     ## and the group with an log(odds ratio) of -1.7
> >
> >     ## we get the same result using logistic regression:
> >     dat %>%
> >       filter(fu == "BL") %>%
> >       glm(group ~ riskfactor, family = "binomial", data = .) %>%
> >       summary
> >     ## ,----
> >     ## | Call:
> >     ## | glm(formula = group ~ riskfactor, family = "binomial", data = .)
> >     ## |
> >     ## | Deviance Residuals:
> >     ## |     Min       1Q   Median       3Q      Max
> >     ## | -1.7890  -1.0484   0.6715   0.6715   1.3121
> >     ## |
> >     ## | Coefficients:
> >     ## |                  Estimate Std. Error z value Pr(>|z|)
> >     ## | (Intercept)        1.3749     0.2386   5.761 8.35e-09 ***
> >     ## | riskfactornorisk  -1.6861     0.2906  -5.802 6.55e-09 ***
> >     ## | ---
> >     ## | Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >     ## |
> >     ## | (Dispersion parameter for binomial family taken to be 1)
> >     ## |
> >     ## |     Null deviance: 350.80  on 257  degrees of freedom
> >     ## | Residual deviance: 312.63  on 256  degrees of freedom
> >     ## | AIC: 316.63
> >     ## |
> >     ## | Number of Fisher Scoring iterations: 4
> >     ## `----
> >
> >
> >
> >     ## ------------------------------------------------- ##
> >     ## Now, we analyse both time points with interaction ##
> >     ## ------------------------------------------------- ##
> >
> >     dat %>%
> >       glmer(group ~ riskfactor + fu + riskfactor:fu + (1|patient),
> family =
> >     "binomial", data = .) %>%
> >       summary
> >     ## ,----
> >     ## | Generalized linear mixed model fit by maximum likelihood
> (Laplace
> >     ## |   Approximation) [glmerMod]
> >     ## |  Family: binomial  ( logit )
> >     ## | Formula: group ~ riskfactor + fu + riskfactor:fu + (1 | patient)
> >     ## |    Data: .
> >     ## |
> >     ## |      AIC      BIC   logLik deviance df.resid
> >     ## |    345.2    366.0   -167.6    335.2      470
> >     ## |
> >     ## | Scaled residuals:
> >     ## |       Min        1Q    Median        3Q       Max
> >     ## | -0.095863 -0.058669  0.002278  0.002866  0.007324
> >     ## |
> >     ## | Random effects:
> >     ## |  Groups  Name        Variance Std.Dev.
> >     ## |  patient (Intercept) 1849     43
> >     ## | Number of obs: 475, groups:  patient, 265
> >     ## |
> >     ## | Fixed effects:
> >     ## |                       Estimate Std. Error z value Pr(>|z|)
> >     ## | (Intercept)            11.6846     1.3736   8.507   <2e-16 ***
> >     ## | riskfactornorisk       -1.5919     1.4166  -1.124    0.261
> >     ## | fuFU                    0.4596     1.9165   0.240    0.810
> >     ## | riskfactornorisk:fuFU  -0.8183     2.1651  -0.378    0.705
> >     ## | ---
> >     ## | Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >     ## |
> >     ## | Correlation of Fixed Effects:
> >     ## |             (Intr) rskfct fuFU
> >     ## | rskfctrnrsk -0.746
> >     ## | fuFU        -0.513  0.510
> >     ## | rskfctrn:FU  0.478 -0.576 -0.908
> >     ## `----
> >
> >     ## I get huge variation in the random effects
> >     ##
> >     ## And the risk factor at BL gets an estimated log(odds ratio) of
> -1.6
> >     ## but one which is not significant
> >     ---------- cut here --------------------------------------------
> >
> >
> >     On 26/11/18 12:10, Leha, Andreas wrote:
> >     > Hi all,
> >     >
> >     > I am interested in assessing the association of a (potential) risk
> >     > factor to a (binary) grouping.
> >     >
> >     > I am having trouble with diverging results from modeling one time
> >     point
> >     > (without random effect) and modeling two time points (with random
> >     effect).
> >     >
> >     > When analysing the first time point (base line, BL) only, I get a
> >     highly
> >     > significant association.
> >     > Now, I want to see, whether there is an interaction between time
> and
> >     > risk factor (the risk factor is not constant).  But when analysing
> >     both
> >     > time points, the estimated effect at BL is estimated to be not
> >     significant.
> >     >
> >     > Now my simplified questions are:
> >     > (1) Is there an association at BL or not?
> >     > (2) How should I analyse both time points with this data?
> >     >
> >     > The aim is to look for confounding with other factors.  But I'd
> >     like to
> >     > understand the simple models before moving on.
> >     >
> >     > Below you find a reproducible example and the detailed results.
> >     >
> >     > Any suggestions would be highly appreciated!
> >     >
> >     > Regards,
> >     > Andreas
> >     >
> >     >
> >     >
> >     > PS: The code / results
> >     >
> >     > ---------- cut here --------------------------------------------
> >     > library("dplyr")
> >     > library("lme4")
> >     > library("lmerTest")
> >     > ## install_github("hrbrmstr/pastebin", upgrade_dependencies =
> FALSE)
> >     > library("pastebin")
> >     >
> >     > ## ---------------------------------- ##
> >     > ## load the data                      ##
> >     > ## ---------------------------------- ##
> >     > dat <- pastebin::get_paste("Xgwgtb7j") %>%
> >     >   as.character %>%
> >     >   gsub("\r\n", "", .) %>%
> >     >   parse(text = .) %>%
> >     >   eval
> >     >
> >     >
> >     >
> >     > ## ---------------------------------- ##
> >     > ## have a look                        ##
> >     > ## ---------------------------------- ##
> >     > dat
> >     > ## ,----
> >     > ## | # A tibble: 475 x 4
> >     > ## |    patient group fu    riskfactor
> >     > ## |    <fct>   <fct> <fct> <fct>
> >     > ## |  1 p001    wt    BL    norisk
> >     > ## |  2 p002    wt    BL    norisk
> >     > ## |  3 p003    wt    BL    norisk
> >     > ## |  4 p004    wt    BL    norisk
> >     > ## |  5 p005    wt    BL    norisk
> >     > ## |  6 p006    wt    BL    norisk
> >     > ## |  7 p007    wt    BL    norisk
> >     > ## |  8 p008    wt    BL    norisk
> >     > ## |  9 p009    wt    BL    risk
> >     > ## | 10 p010    wt    BL    norisk
> >     > ## | # ... with 465 more rows
> >     > ## `----
> >     > dat %>% str
> >     > ## ,----
> >     > ## | Classes ‘tbl_df’, ‘tbl’ and 'data.frame':        475 obs. of
> >     4 variables:
> >     > ## |  $ patient   : Factor w/ 265 levels "p001","p002",..: 1 2 3 4
> >     5 6 7
> >     > 8 9 10 ...
> >     > ## |  $ group     : Factor w/ 2 levels "wt","mut": 1 1 1 1 1 1 1 1
> >     1 1 ...
> >     > ## |  $ fu        : Factor w/ 2 levels "BL","FU": 1 1 1 1 1 1 1 1
> >     1 1 ...
> >     > ## |  $ riskfactor: Factor w/ 2 levels "risk","norisk": 2 2 2 2 2
> >     2 2 2
> >     > 1 2 ...
> >     > ## `----
> >     >
> >     > ## there are 265 patients
> >     > ## in 2 groups: "wt" and "mut"
> >     > ## with a dichotomous risk factor ("risk" and "norisk")
> >     > ## measured at two time points ("BL" and "FU")
> >     >
> >     > dat %>% summary
> >     > ## ,----
> >     > ## |     patient    group      fu       riskfactor
> >     > ## |  p001   :  2   wt :209   BL:258   risk  :205
> >     > ## |  p002   :  2   mut:266   FU:217   norisk:270
> >     > ## |  p003   :  2
> >     > ## |  p004   :  2
> >     > ## |  p005   :  2
> >     > ## |  p006   :  2
> >     > ## |  (Other):463
> >     > ## `----
> >     >
> >     > ## group sizes seem fine
> >     >
> >     >
> >     >
> >     > ## ---------------------------------------------- ##
> >     > ## first, we look at the first time point, the BL ##
> >     > ## ---------------------------------------------- ##
> >     >
> >     > ## we build a cross table
> >     > tab_bl <-
> >     >   dat %>%
> >     >   dplyr::select(group, riskfactor) %>%
> >     >   table
> >     > tab_bl
> >     > ## ,----
> >     > ## |      riskfactor
> >     > ## | group risk norisk
> >     > ## |   wt    35    174
> >     > ## |   mut  170     96
> >     > ## `----
> >     >
> >     > ## and we test using fisher:
> >     > tab_bl %>% fisher.test
> >     > ## ,----
> >     > ## |    Fisher's Exact Test for Count Data
> >     > ## |
> >     > ## | data:  .
> >     > ## | p-value < 2.2e-16
> >     > ## | alternative hypothesis: true odds ratio is not equal to 1
> >     > ## | 95 percent confidence interval:
> >     > ## |  0.07099792 0.18002325
> >     > ## | sample estimates:
> >     > ## | odds ratio
> >     > ## |  0.1141677
> >     > ## `----
> >     > log(0.114)
> >     > ## ,----
> >     > ## | [1] -2.171557
> >     > ## `----
> >     >
> >     > ## so, we get a highly significant association of the riskfactor
> >     > ## and the group with an log(odds ratio) of -2.2
> >     >
> >     > ## we get the same result using logistic regression:
> >     > dat %>%
> >     >   glm(group ~ riskfactor, family = "binomial", data = .) %>%
> >     >   summary
> >     > ## ,----
> >     > ## |
> >     > ## | Call:
> >     > ## | glm(formula = group ~ riskfactor, family = "binomial", data =
> .)
> >     > ## |
> >     > ## | Deviance Residuals:
> >     > ## |     Min       1Q   Median       3Q      Max
> >     > ## | -1.8802  -0.9374   0.6119   0.6119   1.4381
> >     > ## |
> >     > ## | Coefficients:
> >     > ## |                  Estimate Std. Error z value Pr(>|z|)
> >     > ## | (Intercept)        1.5805     0.1856   8.515   <2e-16 ***
> >     > ## | riskfactornorisk  -2.1752     0.2250  -9.668   <2e-16 ***
> >     > ## | ---
> >     > ## | Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >     > ## |
> >     > ## | (Dispersion parameter for binomial family taken to be 1)
> >     > ## |
> >     > ## |     Null deviance: 651.63  on 474  degrees of freedom
> >     > ## | Residual deviance: 538.83  on 473  degrees of freedom
> >     > ## | AIC: 542.83
> >     > ## |
> >     > ## | Number of Fisher Scoring iterations: 4
> >     > ## `----
> >     >
> >     >
> >     >
> >     > ## ------------------------------------------------- ##
> >     > ## Now, we analyse both time points with interaction ##
> >     > ## ------------------------------------------------- ##
> >     >
> >     > dat %>%
> >     >   glmer(group ~ riskfactor + fu + riskfactor:fu + (1|patient),
> >     family =
> >     > "binomial", data = .) %>%
> >     >   summary
> >     > ## ,----
> >     > ## | Generalized linear mixed model fit by maximum likelihood
> (Laplace
> >     > ## |   Approximation) [glmerMod]
> >     > ## |  Family: binomial  ( logit )
> >     > ## | Formula: group ~ riskfactor + fu + riskfactor:fu + (1 |
> patient)
> >     > ## |    Data: .
> >     > ## |
> >     > ## |      AIC      BIC   logLik deviance df.resid
> >     > ## |    345.2    366.0   -167.6    335.2      470
> >     > ## |
> >     > ## | Scaled residuals:
> >     > ## |       Min        1Q    Median        3Q       Max
> >     > ## | -0.095863 -0.058669  0.002278  0.002866  0.007324
> >     > ## |
> >     > ## | Random effects:
> >     > ## |  Groups  Name        Variance Std.Dev.
> >     > ## |  patient (Intercept) 1849     43
> >     > ## | Number of obs: 475, groups:  patient, 265
> >     > ## |
> >     > ## | Fixed effects:
> >     > ## |                       Estimate Std. Error z value Pr(>|z|)
> >     > ## | (Intercept)            11.6846     1.3736   8.507   <2e-16 ***
> >     > ## | riskfactornorisk       -1.5919     1.4166  -1.124    0.261
> >     > ## | fuFU                    0.4596     1.9165   0.240    0.810
> >     > ## | riskfactornorisk:fuFU  -0.8183     2.1651  -0.378    0.705
> >     > ## | ---
> >     > ## | Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >     > ## |
> >     > ## | Correlation of Fixed Effects:
> >     > ## |             (Intr) rskfct fuFU
> >     > ## | rskfctrnrsk -0.746
> >     > ## | fuFU        -0.513  0.510
> >     > ## | rskfctrn:FU  0.478 -0.576 -0.908
> >     > ## `----
> >     >
> >     > ## I get huge variation in the random effects
> >     > ##
> >     > ## And the risk factor at BL gets an estimated log(odds ratio) of
> -1.6
> >     > ## but one which is not significant
> >     > ---------- cut here --------------------------------------------
> >     > _______________________________________________
> >     > R-sig-mixed-models using r-project.org
> >     <mailto:R-sig-mixed-models using r-project.org> mailing list
> >     > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >     >
> >
> >     --
> >     Dr. Andreas Leha
> >     Head of the 'Core Facility
> >     Medical Biometry and Statistical Bioinformatics'
> >
> >     UNIVERSITY MEDICAL CENTER GÖTTINGEN
> >     GEORG-AUGUST-UNIVERSITÄT
> >     Department of Medical Statistics
> >     Humboldtallee 32
> >     37073 Göttingen
> >     Mailing Address: 37099 Göttingen, Germany
> >     Fax: +49 (0) 551 39-4995
> >     Tel: +49 (0) 551 39-4987
> >     http://www.ams.med.uni-goettingen.de/service-de.shtml
> >     _______________________________________________
> >     R-sig-mixed-models using r-project.org
> >     <mailto:R-sig-mixed-models using r-project.org> mailing list
> >     https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
>
> --
> Dr. Andreas Leha
> Head of the 'Core Facility
> Medical Biometry and Statistical Bioinformatics'
>
> UNIVERSITY MEDICAL CENTER GÖTTINGEN
> GEORG-AUGUST-UNIVERSITÄT
> Department of Medical Statistics
> Humboldtallee 32
> 37073 Göttingen
> Mailing Address: 37099 Göttingen, Germany
> Fax: +49 (0) 551 39-4995
> Tel: +49 (0) 551 39-4987
> http://www.ams.med.uni-goettingen.de/service-de.shtml

	[[alternative HTML version deleted]]