[R-sig-ME] fixed effects estimates too small in binomial GLMM with low "success"-rate (

Mon May 19 12:22:01 CEST 2014

Dear Ben and others on the list, 

The 'possibly neglected' mail exchange is probably this one; 
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2014q1/021919.html

I hope the link helps ( I'm about to board a plane) but I'm excited to see my question is picked up and might have an answer. 

Kind regards,

Tom

> Op 18 mei 2014 om 12:00 heeft <r-sig-mixed-models-request op r-project.org> het volgende geschreven:
> 
> Send R-sig-mixed-models mailing list submissions to
>    r-sig-mixed-models op r-project.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>    https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> or, via email, send a message with subject or body 'help' to
>    r-sig-mixed-models-request op r-project.org
> 
> You can reach the person managing the list at
>    r-sig-mixed-models-owner op r-project.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of R-sig-mixed-models digest..."
> 
> 
> Today's Topics:
> 
>   1. fixed effects estimates too small in binomial GLMM with    low
>      "success"-rate (Henrik
>   2. Re: fixed effects estimates too small in binomial GLMM with
>      low "success"-rate (Ben Bolker)
>   3. Re: fixed effects estimates too small in binomial GLMM    with
>      low "success"-rate (Henrik Singmann)
>   4. "parm" parameter in lme4::confint.merMod (Jake Westfall)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Sun, 18 May 2014 00:07:13 +0200
> From: Henrik Singmann <henrik.singmann op psychologie.uni-freiburg.de>
> To: r-sig-mixed-models op r-project.org
> Subject: [R-sig-ME] fixed effects estimates too small in binomial GLMM
>    with    low "success"-rate
> Message-ID: <5377DD91.9030705 op psychologie.uni-freiburg.de>
> Content-Type: text/plain; charset=ISO-8859-15; format=flowed
> 
> Dear list,
> 
> I would really appreciate your input on an issue in which the fixed effects estimates of a binomial GLMM with relatively low rate of "successes" are too low (compared to the observed effect) by around .5%.
> 
> The data comes from six comparable experiments in which we measured participants' performance errors on a series of trials in a design with one factor with two levels ("a" and "b"). Overall the difference in error rate between the two levels was very small (~0.5%), but when analyzing all data (with participant and experiment as random effects) the effect of factor reached significance (p = .03), which is also consistent with other published data.
> 
> However, the estimated effects are around .5% lower than the observed mean error rates indicating that the model may perhaps not adequately describe the data. Furthermore, I also get a convergence warning ("Model failed to converge with max|grad| = 0.0013716 (tol = 0.001)").
> 
> ####### code 1 ##########
> require(lme4)
> dat <- read.table("http://pastebin.com/raw.php?i=KWjD5EG3")
> dat$experiment <- factor(dat$experiment)
> 
> # observed effects
> aggregate(errors ~ factor, dat, mean) #unweighted
> ##   factor     errors
> ## 1      a 0.02266914
> ## 2      b 0.02772540
> by(dat, dat$factor, function(x) with(x, sum(errors*weights)/sum(weights))) #weighted
> ## dat$factor: a
> ## [1] 0.0222343
> ## ------------------------------------------------------------
> ## dat$factor: b
> ## [1] 0.02777651
> 
> m1 <- glmer(errors ~ factor + (factor|id) + (1|experiment), dat, weights = dat$weights, family = binomial)
> ## Warnmeldung:
> ## In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
> ##   Model failed to converge with max|grad| = 0.0013716 (tol = 0.001)
> coef(summary(m1))[2,, drop = FALSE]
> ##          Estimate Std. Error  z value   Pr(>|z|)
> ## factorb 0.1788881 0.08005533 2.234556 0.02544653
> 
> # estimated effect:
> logistic <- function(x) 1/(1+exp(-x))
> logistic(cumsum(fixef(m1)))
> ## (Intercept)     factorb
> ##  0.01825229  0.02174991
> 
> ####### end code 1 ##########
> 
> Interestingly, I can replicate this behavior when simulating binomial data similar to the one I am having. When the overall mean is relatively small, the estimates from the GLMM are consistently too low. When increasing the overall mean, this consistent bias in the estimated effect seems to go away (and the observed effect gets larger due to the non-linear nature of the logistic model). The code of the simulation for one data set follows below.
> 
> My question is: Does anyone have an idea what I could do to reach a better agreement between observed and predicted values? I tried different link functions, but that didn't help. And I am unsure whether I can trust my model given this discrepancy.
> 
> Thanks a lot,
> Henrik
> 
> 
> ##### code 2 #####
> 
> mu <- -3.7  # when larger, problem disappears
> n_experiments <- 6
> n_participants <- 20
> n_trials <- 400
> sigma_effect <- 0.2
> effect_size <- 0.4
> sigma_p <- 0.7
> sigma_e <- 0.1
> prob_level <- 0.5
> 
> # create data:
> df <- expand.grid(experiment = factor(seq(1, n_experiments)), id = factor(seq(1, n_participants)), factor = c(-1, 1))
> df$id <- with(df, experiment:id)
> df <- merge(df, data.frame(experiment = seq(1, n_experiments), re_e = rnorm(n_experiments, sd = sigma_e)))
> df <- merge(df, data.frame(id = levels(df$id), re_p = rnorm(length(levels(df$id)), sd = sigma_p)))
> df <- merge(df, data.frame(id = levels(df$id), re_a = rnorm(length(levels(df$id)), sd = sigma_effect)))
> df <- with(df, df[order(id, factor),])
> tmp <- rbinom(n_experiments*n_participants, n_trials, prob_level)
> tmp <- rep(tmp, each = 2)
> tmp[c(FALSE, TRUE)] <- n_trials-tmp[c(FALSE, TRUE)]
> df$errors <- mapply(function(x, y) rbinom(1, y, x), with(df, logistic(mu+re_p+(factor*(effect_size/2)+factor*re_a)+re_e)), tmp)/tmp
> df$weights <- tmp
> df$factor <- factor(df$factor)
> 
> aggregate(errors ~ factor, df, mean) #unweighted
> ##   factor     errors
> ## 1     -1 0.02352719
> ## 2      1 0.03718523
> 
> s1 <- glmer(errors ~ factor + (factor|id) + (1|experiment), df, weights = df$weights, family = binomial)
> ## Warnmeldung:
> ## In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
> ##   Model failed to converge with max|grad| = 0.0130648 (tol = 0.001)
> coef(summary(s1))[2,, drop = FALSE]
> #          Estimate Std. Error  z value     Pr(>|z|)
> # factor1 0.4765694 0.07578773 6.288213 3.211416e-10
> 
> logistic(cumsum(fixef(s1)))
> # (Intercept)     factor1
> #  0.01849934  0.02946117
> 
> #### end code 2 ####
> 
> 
> -- 
> Dr. Henrik Singmann
> Albert-Ludwigs-Universit?t Freiburg, Germany
> http://www.psychologie.uni-freiburg.de/Members/singmann
> 
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Sat, 17 May 2014 18:19:05 -0400
> From: Ben Bolker <bbolker op gmail.com>
> To: Henrik Singmann <henrik.singmann op psychologie.uni-freiburg.de>,
>    r-sig-mixed-models op r-project.org
> Subject: Re: [R-sig-ME] fixed effects estimates too small in binomial
>    GLMM with low "success"-rate
> Message-ID: <5377E059.2050800 op gmail.com>
> Content-Type: text/plain; charset=ISO-8859-15
> 
>> On 14-05-17 06:07 PM, Henrik Singmann wrote:
>> Dear list,
>> 
>> I would really appreciate your input on an issue in which the fixed
>> effects estimates of a binomial GLMM with relatively low rate of
>> "successes" are too low (compared to the observed effect) by around .5%.
> 
>  Don't know, but one comment and one thought:
> 
>  (1) the convergence warning goes away with the most recent version
> of lme4 (1.1-7)
> 
>  (2) I'm _guessing_ that you will find that this phenomenon is
> due to Jensen's effect (i.e., a nonlinear averaging effect) --
> that would mean its magnitude should be proportional to the
> random-effects variances and to the strength of the nonlinear
> (inverse link) relationship.
>  (I think there was a similar, possibly neglected, question along
> these lines on the mailing list earlier ...)
> 
>  Ben Bolker
> 
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Sun, 18 May 2014 00:38:20 +0200
> From: Henrik Singmann <henrik.singmann op psychologie.uni-freiburg.de>
> To: r-sig-mixed-models op r-project.org
> Subject: Re: [R-sig-ME] fixed effects estimates too small in binomial
>    GLMM    with low "success"-rate
> Message-ID: <5377E4DC.6080809 op psychologie.uni-freiburg.de>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> 
> 
> 
> Am 18.05.2014 00:19, schrieb Ben Bolker:
>> On 14-05-17 06:07 PM, Henrik Singmann wrote:
>>> Dear list,
>>> 
>>> I would really appreciate your input on an issue in which the fixed
>>> effects estimates of a binomial GLMM with relatively low rate of
>>> "successes" are too low (compared to the observed effect) by around .5%.
>> 
>>   Don't know, but one comment and one thought:
>> 
>>   (1) the convergence warning goes away with the most recent version
>> of lme4 (1.1-7)
> 
> This is true, thanks.
>> 
>>   (2) I'm _guessing_ that you will find that this phenomenon is
>> due to Jensen's effect (i.e., a nonlinear averaging effect) --
>> that would mean its magnitude should be proportional to the
>> random-effects variances and to the strength of the nonlinear
>> (inverse link) relationship.
>>   (I think there was a similar, possibly neglected, question along
>> these lines on the mailing list earlier ...)
> 
> Thanks for the pointer. In fact, when entering experiment as fixed effect (which removes the random effect with the smallest variance), the estimates are way more precise. They are almost perfect. Perhaps I should then use this approach (although I would prefer treating experiment as random). Or what would you do?
> 
> 
>> 
>>   Ben Bolker
> 
> -- 
> Dr. Henrik Singmann
> Albert-Ludwigs-Universit?t Freiburg, Germany
> http://www.psychologie.uni-freiburg.de/Members/singmann
> 
> 
> 
> ------------------------------
> 
> Message: 4
> Date: Sun, 18 May 2014 00:03:52 -0600
> From: Jake Westfall <jake987722 op hotmail.com>
> To: "r-sig-mixed-models op r-project.org"
>    <r-sig-mixed-models op r-project.org>
> Subject: [R-sig-ME] "parm" parameter in lme4::confint.merMod
> Message-ID: <BAY172-W2375062D4CFCE0B6E5F9CBCB330 op phx.gbl>
> Content-Type: text/plain
> 
> Hi all,
> 
> According to the documentation, in confint.merMod() you can use the "parm" parameter to get confidence intervals for only a specified subset of parameters by indicating the integer positions of the desired parameters. But how is one supposed to know which integer positions correspond with which parameters? I poked around a little but found nothing in the methods or slots of the merMod object that indicates this, and nothing in the documentation detailing this. Have I missed something? I guess you can call the function leaving that argument at its default, and the order of the parameters in that output will correspond with the integer positions, but this approach would seem to defeat the point of being able to specify particular subsets of parameters...
> 
> Jake                         
>    [[alternative HTML version deleted]]
> 
> 
> 
> ------------------------------
> 
> _______________________________________________
> R-sig-mixed-models mailing list
> R-sig-mixed-models op r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
> 
> End of R-sig-mixed-models Digest, Vol 89, Issue 22
> **************************************************