[R-sig-ME] estimation of intercept in binomial glmer

Tue Dec 3 22:16:07 CET 2013

Björn Lindström <Bjorn.Lindstrom at ...> writes:

> 
> Dear all,

> I have a data set with 25 subjects, all with 20 binary responses
> (psychological learning task). Many subjects gave the 1 response
> (lets call this response A and the 0 response B) throughout the
> task.

> My goal is to estimate the Probability of A (P(A)), and if it is
> above chance (the latter is trivial in this data set, but I have
> several other similar sets where its more of an issue).

> If I calculate the proportion of A responses for each subject (mean,
> na.rm=T), the sample mean is 0.757 (the sample distribution of
> proportion A is very skewed toward 1, with a few all 0 respondents).

> If i instead use glmer:
> glmer(RespondA~1+(1|Subject),family=binomial,data=data),

> Fixed effects:
>       Estimate Std. Error z value Pr(>|z|)
> (Intercept)    3.660      1.143   3.201  0.00137 **
> 
> ,with an estimate that is far above 0.757. Plogis(3.66) = 0.974. 
> This estimate is close to the sample median
> (md =1), but does it make sense?
> 
> Ordinary glm, ignoring the Subject factor, gives an intercept closer to
>  the sample mean :
> 
> Coefficients:
>             Estimate Std. Error z value Pr(>|z|)
> (Intercept)   1.1995     0.1073   11.18   <2e-16 ***
> 
> (plogis(1.1995) = 0.768)

> Can someone please illuminate whats happening here? Is it shrinkage
> in the GLMM? Seem a bit much for just the intercept right?
> Overdispersion (dont know much about that...)

  This is an interesting question; I find it hard to answer precisely
without seeing the original data, but it doesn't surprise me 
very much that in this kind of extreme situation (with complete
or near-complete separation for some of the respondents) the results
from naive averaging, GLM estimation (which should correspond to 
averaging on the logit scale), and GLMM estimation would differ
considerably.  The GLMM intercept represents (roughly) the population
average across individuals of the log-odds response, while the GLM
intercept represents the the population average across observations.
You might get some enlightenment out of the relevant section from
Agresti's _Categorical Data Analysis_ book (sorry, don't have it
with me) on marginal vs conditional estimates ...

 Ben Bolker