[R-sig-ME] fixed effects estimates too small in binomial GLMM with low "success"-rate

Sun May 18 00:07:13 CEST 2014

Dear list,

I would really appreciate your input on an issue in which the fixed effects estimates of a binomial GLMM with relatively low rate of "successes" are too low (compared to the observed effect) by around .5%.

The data comes from six comparable experiments in which we measured participants' performance errors on a series of trials in a design with one factor with two levels ("a" and "b"). Overall the difference in error rate between the two levels was very small (~0.5%), but when analyzing all data (with participant and experiment as random effects) the effect of factor reached significance (p = .03), which is also consistent with other published data.

However, the estimated effects are around .5% lower than the observed mean error rates indicating that the model may perhaps not adequately describe the data. Furthermore, I also get a convergence warning ("Model failed to converge with max|grad| = 0.0013716 (tol = 0.001)").

####### code 1 ##########
require(lme4)
dat <- read.table("http://pastebin.com/raw.php?i=KWjD5EG3")
dat$experiment <- factor(dat$experiment)

# observed effects
aggregate(errors ~ factor, dat, mean) #unweighted
##   factor     errors
## 1      a 0.02266914
## 2      b 0.02772540
by(dat, dat$factor, function(x) with(x, sum(errors*weights)/sum(weights))) #weighted
## dat$factor: a
## [1] 0.0222343
## ------------------------------------------------------------
## dat$factor: b
## [1] 0.02777651

m1 <- glmer(errors ~ factor + (factor|id) + (1|experiment), dat, weights = dat$weights, family = binomial)
## Warnmeldung:
## In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
##   Model failed to converge with max|grad| = 0.0013716 (tol = 0.001)
coef(summary(m1))[2,, drop = FALSE]
##          Estimate Std. Error  z value   Pr(>|z|)
## factorb 0.1788881 0.08005533 2.234556 0.02544653

# estimated effect:
logistic <- function(x) 1/(1+exp(-x))
logistic(cumsum(fixef(m1)))
## (Intercept)     factorb
##  0.01825229  0.02174991

####### end code 1 ##########

Interestingly, I can replicate this behavior when simulating binomial data similar to the one I am having. When the overall mean is relatively small, the estimates from the GLMM are consistently too low. When increasing the overall mean, this consistent bias in the estimated effect seems to go away (and the observed effect gets larger due to the non-linear nature of the logistic model). The code of the simulation for one data set follows below.

My question is: Does anyone have an idea what I could do to reach a better agreement between observed and predicted values? I tried different link functions, but that didn't help. And I am unsure whether I can trust my model given this discrepancy.

Thanks a lot,
Henrik

##### code 2 #####

mu <- -3.7  # when larger, problem disappears
n_experiments <- 6
n_participants <- 20
n_trials <- 400
sigma_effect <- 0.2
effect_size <- 0.4
sigma_p <- 0.7
sigma_e <- 0.1
prob_level <- 0.5

# create data:
df <- expand.grid(experiment = factor(seq(1, n_experiments)), id = factor(seq(1, n_participants)), factor = c(-1, 1))
df$id <- with(df, experiment:id)
df <- merge(df, data.frame(experiment = seq(1, n_experiments), re_e = rnorm(n_experiments, sd = sigma_e)))
df <- merge(df, data.frame(id = levels(df$id), re_p = rnorm(length(levels(df$id)), sd = sigma_p)))
df <- merge(df, data.frame(id = levels(df$id), re_a = rnorm(length(levels(df$id)), sd = sigma_effect)))
df <- with(df, df[order(id, factor),])
tmp <- rbinom(n_experiments*n_participants, n_trials, prob_level)
tmp <- rep(tmp, each = 2)
tmp[c(FALSE, TRUE)] <- n_trials-tmp[c(FALSE, TRUE)]
df$errors <- mapply(function(x, y) rbinom(1, y, x), with(df, logistic(mu+re_p+(factor*(effect_size/2)+factor*re_a)+re_e)), tmp)/tmp
df$weights <- tmp
df$factor <- factor(df$factor)

aggregate(errors ~ factor, df, mean) #unweighted
##   factor     errors
## 1     -1 0.02352719
## 2      1 0.03718523

s1 <- glmer(errors ~ factor + (factor|id) + (1|experiment), df, weights = df$weights, family = binomial)
## Warnmeldung:
## In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
##   Model failed to converge with max|grad| = 0.0130648 (tol = 0.001)
coef(summary(s1))[2,, drop = FALSE]
#          Estimate Std. Error  z value     Pr(>|z|)
# factor1 0.4765694 0.07578773 6.288213 3.211416e-10

logistic(cumsum(fixef(s1)))
# (Intercept)     factor1
#  0.01849934  0.02946117

#### end code 2 ####

-- 
Dr. Henrik Singmann
Albert-Ludwigs-Universität Freiburg, Germany
http://www.psychologie.uni-freiburg.de/Members/singmann