[R-sig-ME] correspondence between intercept in a logit model and mean y response/probability

Wed Jun 29 16:23:09 CEST 2011

Thanks Jarrod, I figured it would be something elementary like that. Also, I see that:

mean(plogis(mod1 at eta))
and
mean(mod1 at mu)

both yield 0.3473491--very close to the mean of longdata$contact (0.3503684). 

However, I don't really know what to make of the "predicted mode". The usual explanation of logit models says something like: (a) we're interested in probabilities; (b) we model the log-odds by necessity; and (c) having fitted a logit model it's useful to reconvert the expected values for different combinations of covariates back to probabilities, using prob = exp(XB)/(1+exp(XB). If "prob" in this case is estimated to be 0.2173616, whereas we know that the overall average probability in the dataset is 0.3503684... what gives? What's the relationship between the "predicted mode" and the "expected probability"?

Much appreciated,
Malcolm

On 29 Jun 2011, at 13:48, Jarrod Hadfield wrote:

> Hi,
> 
> 0.2173616 is the predicted mode. The inverse-logit transform is non-linear so f(E[x]) does not equal E[f(x)].
> 
> E[f(x)] can be approximated (well) as:
> 
> c2<-((16*sqrt(3))/(15*pi))^2
> plogis(eta/sqrt(1+c2*v))
> 
> where eta is the linear predictor on the link scale (the intercept in your case), and v is the variation around the linear predictor on the link scale (probably the sum of your variance components).
> 
> Jarrod
> 
> 
> 
> 
> Quoting Malcolm Fairbrother <m.fairbrother at bristol.ac.uk> on Wed, 29 Jun 2011 10:31:25 +0100:
> 
>> Dear list,
>> 
>> I'm fitting a mixed logit model with lme4, and finding something that seems weird to me, but probably has a simple explanation. I suspect someone on this list will be able to clarify what's going on. In brief, the issue is the correspondence between the intercept term in a mixed logit model and the mean response/probability of an outcome across all units.
>> 
>> The mean of my binary response variable is about 0.35:
>> 
>>> mean(longdata$contact)
>> [1] 0.3503684
>> 
>> But when I fit mod1 below, the Intercept is estimated to be -1.28111, which does NOT correspond to this mean response:
>> 
>>> mod1 <- lmer(contact ~ 1 + (1 | group) + (1 | id), longdata, family=binomial)
>>> plogis(fixef(mod1))
>> (Intercept)
>> 0.2173616
>> 
>> Huh? Why is this happening? Is it something to do with the shrinkage that occurs because of the clustering in higher-level units? I would have expected an intercept term close to the log-odds equivalent of a probability of 0.35. I presume the difference between empirical and modelled mean probability isn't indicative of any big problems, and indeed might be a useful result, but I'd like to know what I should understand by it.
>> 
>> Any help would be much appreciated (and apologies for posting a lot to this list recently).
>> 
>> - Malcolm
>> 
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> 
>> 
> 
> 
> 
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
>