[R-sig-ME] Precision about the glmer model for Bernoulli variables

Mon Apr 20 17:27:39 CEST 2020

Hi Emmanuel,

Your reasoning is correct.

As a quibble, outside a simple repeated measures experiment setup, p(i,j) *does* depend on j.
For example, if observations are collected over time, generally there is a time effect; if they are repeated measures with different experimental conditions, p(i,j) will depend on the condition j, etc.

There is almost certainly no closed form solution for the covariance under logit.
I am not sure about the probit (my guess is not).
There will be some Laplace approximations available, a la Breslow and Clayton 1993.

I'd be curious if these formulas/approximations were developed somewhere - I'd be surprised if they weren't.

Florin

> On Apr 20, 2020, at 12:48 AM, Emmanuel Curis <emmanuel.curis using parisdescartes.fr> wrote:
> 
> Hello everyone,
> 
> I hope you're all going fine in these difficult times.
> 
> I tried to understand in details the exact model used when using glmer
> for a Bernoulli experiment, by comparison with the linear mixed
> effects model, and especially how it introducts correlations between
> observations of a given group.  I think I finally got it, but could
> you check that what I write below is correct and that I'm not missing
> something?
> 
> I use a very simple case with only a single random effect, and no
> fixed effects, because I guess that adding fixed effects or other
> random effects does not change the idea, it "just" makes formulas more
> complex.  I note i the random effect level, let's say « patient », and
> j the observation for this patient.
> 
> In the linear model, we have Y(i,j) = µ0 + Z(i) + epsilon( i, j ) with
> Z(i) and epsilon(i,j) randoms variables having a density of
> probability, independant, and each iid.
> 
> Hence, cov( Y(i,j), Y(i,j') ) = Var( Z(i) ): the model introduces a
> positive correlation between observations of the same patient.
> 
> 
> 
> In the Bernoulli model, Y(i,j) ~ B( pi(i,j) ) and pi(i,j) = f( Z(i) ),
> f being the inverse link function, typically the reciprocal of the
> logit. So we have
> 
> cov( Y(i,j), Y(i,j') ) = E( Y(i,j) Y(i, j') ) - E( Y(i,j) ) E( Y(i,j') )
>     = Pr( Y(i,j) = 1 inter Y(i,j') = 1 ) - pi(i,j) * pi(i,j')
> 
> Since in practice pi(i,j) does not depend on j, pi(i,j) = pi(i,j').
> 
> Pr( Y(i,j) = 1 inter Y(i,j') = 1 ) =
>  integral(R) Pr( Y(i,j) = 1 inter Y(i,j') = 1 | Z(i) = z ) p( Z(i) = z ) dz
> 
> Then, we assume that conditionnally on Zi, the Yij are independant, is
> this right? This is the equivalent of « the epsilon(i, j) are
> independant »? I assume this hypothesis is also used for computing the
> likelihood? If not, what is the model for the joint probability?
> 
> In that case,
> 
> Pr( Y(i,j) = 1 inter Y(i,j') = 1 ) =
>  integral(R) f(z) f(z) p( Z(i) = z ) dz
> 
> and since pi(i,j) = integral( R ) f(z) p( Z(i) = z ) dz we have
> 
> cov( Y(i,j), Y(i,j') ) =
> integral( R ) f²(z) p( Z(i) = z ) dz -
>  ( integral( R ) f(z) p( Z(i) = z ) dz )²
> 
> which in general has no reason to be nul, hence the two observations
> are correlated. Is this correct?
> 
> Is there any way to have a closed-form of the covariance, for usual f
> (let's say, logit or probit) and Z distribution (let's say, Gaussian)?
> 
> Thanks a lot for reading, and your answers,
> 
> -- 
>                                Emmanuel CURIS
>                                emmanuel.curis using parisdescartes.fr
> 
> Page WWW: http://emmanuel.curis.online.fr/index.html
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models