[R-sig-ME] Random effects of logistic regression: bias towards the mean?

Tue Mar 25 09:06:29 CET 2014

Dear Tom,

it is hard to answer your question without the actual output of your model (not of your simulation), i.e. print(summary(model), corr = F). It is also not clear to me what you are actually measuring, i.e. what the "hit" should be. Perhaps you can elaborate.

With kind regards

Tibor

Am 25.03.2014 um 08:54 schrieb Tom Lentz:

> Dear all,
> 
> The following question might be due to my poor understanding of logistic regression, in which case I would be very grateful for an explanation or a pointer to reading material.
> 
> With my current understanding I think that logistic regression as typically done with lmer and family="binomial" (actually calls glmer, as calling lmer is now deprecated) behaves in an unexpected way, because it does not make random effects be near zero but moves them towards chance, i.e. towards positive values if the probability of a hit is below 0.5 and towards negative values if the probability of a hit is above 0.5. At first I thought this was shrinkage, but it does not happen if data is aggregated and a normal linear mixed model fitted to percentages, but I think that is ugly and should lead to worse or equal results, not better ones, because the percentages cannot be normally distributed, especially if they are far from chance.
> 
> I have discovered this issue with the analysis of eye-tracking data, in which the chance of looking at the target was around 0.25, but the fixed effects in my model were lower than the mean and the random effects for participant and item were not around zero (hence, participants tend to be better than the fixed effect/average and items generally tend to be recognised better than on the fixed effects predicted/average). The result is that the fixed effects estimates are not at the average values, but lower.
> 
> As my data set might have had a poorly understood conspiracy in it, I simulated data. Every simulated data set had 40 participants and 40 items (easy if you make it up!), but no effect of fixed effects; there was a condition (A, B, C or D) but the outcome was not influenced by this condition. The dependent variable was drawn with rbinom(1600, 1, probability), where probability was varied: 0.1, 0.15, 0.2 up till 0.9.
> 
> For each probability I ran 2000 analyses with this formula:
> lmer(outcome ~ cond + (1|i) + (1|p), data=dataset, family = "binomial")
> and looked at the random effects for item and participants. Indeed, the lower the hit rate (the probability of the dependent variable outcome being TRUE or 1), the higher the average random effect, with a zero average for the random effects only at a 0.5 probability (or 0 logit). A plot can be found at <http://www.hum.uu.nl/medewerkers/t.o.lentz/plotRanefsR3.pdf>.
> 
> The fixed effect of cond should not be significant, as the data is made up without regard to it. Indeed, at an alpha of 0.05 a spurious significant effect was only found in 4,2 % of the simulations. So, the analyses are not causing errors for hypothesis testing, but the estimates of the random effects are off. Is there a good explanation or is this unexpected behaviour?
> 
> Version information: I have detected the problem a while ago, still in R 2, but it still happens in R 3.0.3 with lme4 version 1.1-5.
> 
> Thanks in advance for your help!
> 
> Kind regards,
> 
> Tom
> 
> TO Lentz PhD
> Postdoctoral Researcher,
> Parsing and Metrical Structure: Where Phonology Meets Processing
> 
> Utrecht Institute of Linguistics OTS
> Utrecht University
> Trans 10
> 3512 JK Utrecht
> Netherlands
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models