[R-sig-ME] Random effects of logistic regression: bias towards the mean?
Tom Lentz
t.o.lentz at uu.nl
Tue Mar 25 08:54:11 CET 2014
Dear all,
The following question might be due to my poor understanding of logistic
regression, in which case I would be very grateful for an explanation or
a pointer to reading material.
With my current understanding I think that logistic regression as
typically done with lmer and family="binomial" (actually calls glmer, as
calling lmer is now deprecated) behaves in an unexpected way, because it
does not make random effects be near zero but moves them towards chance,
i.e. towards positive values if the probability of a hit is below 0.5
and towards negative values if the probability of a hit is above 0.5. At
first I thought this was shrinkage, but it does not happen if data is
aggregated and a normal linear mixed model fitted to percentages, but I
think that is ugly and should lead to worse or equal results, not better
ones, because the percentages cannot be normally distributed, especially
if they are far from chance.
I have discovered this issue with the analysis of eye-tracking data, in
which the chance of looking at the target was around 0.25, but the fixed
effects in my model were lower than the mean and the random effects for
participant and item were not around zero (hence, participants tend to
be better than the fixed effect/average and items generally tend to be
recognised better than on the fixed effects predicted/average). The
result is that the fixed effects estimates are not at the average
values, but lower.
As my data set might have had a poorly understood conspiracy in it, I
simulated data. Every simulated data set had 40 participants and 40
items (easy if you make it up!), but no effect of fixed effects; there
was a condition (A, B, C or D) but the outcome was not influenced by
this condition. The dependent variable was drawn with rbinom(1600, 1,
probability), where probability was varied: 0.1, 0.15, 0.2 up till 0.9.
For each probability I ran 2000 analyses with this formula:
lmer(outcome ~ cond + (1|i) + (1|p), data=dataset, family = "binomial")
and looked at the random effects for item and participants. Indeed,
the lower the hit rate (the probability of the dependent variable
outcome being TRUE or 1), the higher the average random effect, with a
zero average for the random effects only at a 0.5 probability (or 0
logit). A plot can be found at
<http://www.hum.uu.nl/medewerkers/t.o.lentz/plotRanefsR3.pdf>.
The fixed effect of cond should not be significant, as the data is made
up without regard to it. Indeed, at an alpha of 0.05 a spurious
significant effect was only found in 4,2 % of the simulations. So, the
analyses are not causing errors for hypothesis testing, but the
estimates of the random effects are off. Is there a good explanation or
is this unexpected behaviour?
Version information: I have detected the problem a while ago, still in R
2, but it still happens in R 3.0.3 with lme4 version 1.1-5.
Thanks in advance for your help!
Kind regards,
Tom
TO Lentz PhD
Postdoctoral Researcher,
Parsing and Metrical Structure: Where Phonology Meets Processing
Utrecht Institute of Linguistics OTS
Utrecht University
Trans 10
3512 JK Utrecht
Netherlands
More information about the R-sig-mixed-models
mailing list