[R-sig-ME] Random effects of logistic regression: bias towards the mean?

Tue Mar 25 08:54:11 CET 2014

Dear all,

The following question might be due to my poor understanding of logistic 
regression, in which case I would be very grateful for an explanation or 
a pointer to reading material.

With my current understanding I think that logistic regression as 
typically done with lmer and family="binomial" (actually calls glmer, as 
calling lmer is now deprecated) behaves in an unexpected way, because it 
does not make random effects be near zero but moves them towards chance, 
i.e. towards positive values if the probability of a hit is below 0.5 
and towards negative values if the probability of a hit is above 0.5. At 
first I thought this was shrinkage, but it does not happen if data is 
aggregated and a normal linear mixed model fitted to percentages, but I 
think that is ugly and should lead to worse or equal results, not better 
ones, because the percentages cannot be normally distributed, especially 
if they are far from chance.

I have discovered this issue with the analysis of eye-tracking data, in 
which the chance of looking at the target was around 0.25, but the fixed 
effects in my model were lower than the mean and the random effects for 
participant and item were not around zero (hence, participants tend to 
be better than the fixed effect/average and items generally tend to be 
recognised better than on the fixed effects predicted/average). The 
result is that the fixed effects estimates are not at the average 
values, but lower.

As my data set might have had a poorly understood conspiracy in it, I 
simulated data. Every simulated data set had 40 participants and 40 
items (easy if you make it up!), but no effect of fixed effects; there 
was a condition (A, B, C or D) but the outcome was not influenced by 
this condition. The dependent variable was drawn with rbinom(1600, 1, 
probability), where probability was varied: 0.1, 0.15, 0.2 up till 0.9.

For each probability I ran 2000 analyses with this formula:
lmer(outcome ~ cond + (1|i) + (1|p), data=dataset, family = "binomial")
  and looked at the random effects for item and participants. Indeed, 
the lower the hit rate (the probability of the dependent variable 
outcome being TRUE or 1), the higher the average random effect, with a 
zero average for the random effects only at a 0.5 probability (or 0 
logit). A plot can be found at 
<http://www.hum.uu.nl/medewerkers/t.o.lentz/plotRanefsR3.pdf>.

The fixed effect of cond should not be significant, as the data is made 
up without regard to it. Indeed, at an alpha of 0.05 a spurious 
significant effect was only found in 4,2 % of the simulations. So, the 
analyses are not causing errors for hypothesis testing, but the 
estimates of the random effects are off. Is there a good explanation or 
is this unexpected behaviour?

Version information: I have detected the problem a while ago, still in R 
2, but it still happens in R 3.0.3 with lme4 version 1.1-5.

Thanks in advance for your help!

Kind regards,

Tom

TO Lentz PhD
Postdoctoral Researcher,
Parsing and Metrical Structure: Where Phonology Meets Processing

Utrecht Institute of Linguistics OTS
Utrecht University
Trans 10
3512 JK Utrecht
Netherlands