[R-sig-ME] Precision about the glmer model for Bernoulli variables

Mon Apr 20 09:48:51 CEST 2020

Hello everyone,

I hope you're all going fine in these difficult times.

I tried to understand in details the exact model used when using glmer
for a Bernoulli experiment, by comparison with the linear mixed
effects model, and especially how it introducts correlations between
observations of a given group.  I think I finally got it, but could
you check that what I write below is correct and that I'm not missing
something?

I use a very simple case with only a single random effect, and no
fixed effects, because I guess that adding fixed effects or other
random effects does not change the idea, it "just" makes formulas more
complex.  I note i the random effect level, let's say « patient », and
j the observation for this patient.

In the linear model, we have Y(i,j) = µ0 + Z(i) + epsilon( i, j ) with
Z(i) and epsilon(i,j) randoms variables having a density of
probability, independant, and each iid.

Hence, cov( Y(i,j), Y(i,j') ) = Var( Z(i) ): the model introduces a
positive correlation between observations of the same patient.

In the Bernoulli model, Y(i,j) ~ B( pi(i,j) ) and pi(i,j) = f( Z(i) ),
f being the inverse link function, typically the reciprocal of the
logit. So we have

cov( Y(i,j), Y(i,j') ) = E( Y(i,j) Y(i, j') ) - E( Y(i,j) ) E( Y(i,j') )
     = Pr( Y(i,j) = 1 inter Y(i,j') = 1 ) - pi(i,j) * pi(i,j')

Since in practice pi(i,j) does not depend on j, pi(i,j) = pi(i,j').

Pr( Y(i,j) = 1 inter Y(i,j') = 1 ) =
  integral(R) Pr( Y(i,j) = 1 inter Y(i,j') = 1 | Z(i) = z ) p( Z(i) = z ) dz

Then, we assume that conditionnally on Zi, the Yij are independant, is
this right? This is the equivalent of « the epsilon(i, j) are
independant »? I assume this hypothesis is also used for computing the
likelihood? If not, what is the model for the joint probability?

In that case,

Pr( Y(i,j) = 1 inter Y(i,j') = 1 ) =
  integral(R) f(z) f(z) p( Z(i) = z ) dz

and since pi(i,j) = integral( R ) f(z) p( Z(i) = z ) dz we have

cov( Y(i,j), Y(i,j') ) =
 integral( R ) f²(z) p( Z(i) = z ) dz -
  ( integral( R ) f(z) p( Z(i) = z ) dz )²

which in general has no reason to be nul, hence the two observations
are correlated. Is this correct?

Is there any way to have a closed-form of the covariance, for usual f
(let's say, logit or probit) and Z distribution (let's say, Gaussian)?

Thanks a lot for reading, and your answers,

-- 
                                Emmanuel CURIS
                                emmanuel.curis using parisdescartes.fr

Page WWW: http://emmanuel.curis.online.fr/index.html