[R-sig-ME] Precision about the glmer model for Bernoulli variables

Emmanuel Curis emm@nue|@cur|@ @end|ng |rom p@r|@de@c@rte@@|r
Wed Apr 22 10:13:29 CEST 2020


Thanks for the demonstration, and the explainations, and the time
spent about that!  That clarifies the interpretation of the models a
lot...

Best regards,

On Tue, Apr 21, 2020 at 05:33:13PM +0000, Vaida, Florin wrote:
> I forgot to mention that this proof works for any link function, any conditional distribution for the response, and any random effects distribution.
> In other words, it's not restricted to logit link, nor to binomial data, nor to normal random effects:
> 
> If Yj ~ iid conditional on u, then Cov(Y_j, Y_k) >= 0, with equality only in certain restrictive conditions.
> 
> 
> > On Apr 21, 2020, at 10:25 AM, Vaida, Florin <fvaida using health.ucsd.edu> wrote:
> > 
> > Hi Emmanuel,
> > 
> > So I can prove positive within-subject correlation for GLME logistic regression with random intercepts - assuming all observations have same mean!
> > 
> > Let Yj ~ Bernoulli(mu), logit(mu) = beta + u, u ~ Normal(0, tau^2).
> > Using the conditional covariance formula you get
> > Cov(Y1, Y2) = E(Cov(Y1,Y2 | u) + Cov(E(Y1|u), E(Y2|u)) = 0 + Cov( mu(u), mu(u)) = Var(mu(u)) >= 0, with 0 only if tau^2 = 0.
> > 
> > This proof does not extend if you let Yj have different means, i.e., replace beta by beta_j.
> > It also does not apply to more general random effects structures, e.g. random intercepts and slopes.
> > Note however that for the *linear* model with random intercepts and slopes, the correlation is not guaranteed positive.
> > 
> > Florin
> > 
> > 
> >> On Apr 20, 2020, at 2:05 PM, Vaida, Florin <fvaida using health.ucsd.edu> wrote:
> >> 
> >> Hi Emmanuel,
> >> 
> >> That's a good question.  My guess is that the correlation is non-negative generally, but I wasn't able to prove it theoretically even in the simplest case when Y1, Y2 ~ Bernoulli(u) independent conditionally on u, and u ~ Normal(0, 1).  I am curious if someone has a solution.
> >> We can't go too far down this route in this forum, since Doug wants to keep it applied.
> >> 
> >> Florin
> >> 
> >>> On Apr 20, 2020, at 12:32 PM, Emmanuel Curis <emmanuel.curis using parisdescartes.fr> wrote:
> >>> 
> >>> Hi Florin,
> >>> 
> >>> Thanks for the answer, the precision about p(i,j), and the reference.
> >>> 
> >>> A last question, that I forgot in my message: is the obtained
> >>> correlation also always positive, as in the linear case? Or may some
> >>> negative correlation appear, depending on the values of pi(i,j) and
> >>> pi(i,j')?
> >>> 
> >>> Best regards,
> >>> 
> >>> On Mon, Apr 20, 2020 at 03:27:39PM +0000, Vaida, Florin wrote:
> >>> « Hi Emmanuel,
> >>> « 
> >>> « Your reasoning is correct.
> >>> « 
> >>> « As a quibble, outside a simple repeated measures experiment setup, p(i,j) *does* depend on j.
> >>> « For example, if observations are collected over time, generally there is a time effect; if they are repeated measures with different experimental conditions, p(i,j) will depend on the condition j, etc.
> >>> « 
> >>> « There is almost certainly no closed form solution for the covariance under logit.
> >>> « I am not sure about the probit (my guess is not).
> >>> « There will be some Laplace approximations available, a la Breslow and Clayton 1993.
> >>> « 
> >>> « I'd be curious if these formulas/approximations were developed somewhere - I'd be surprised if they weren't.
> >>> « 
> >>> « Florin
> >>> « 
> >>> « 
> >>> « > On Apr 20, 2020, at 12:48 AM, Emmanuel Curis <emmanuel.curis using parisdescartes.fr> wrote:
> >>> « > 
> >>> « > Hello everyone,
> >>> « > 
> >>> « > I hope you're all going fine in these difficult times.
> >>> « > 
> >>> « > I tried to understand in details the exact model used when using glmer
> >>> « > for a Bernoulli experiment, by comparison with the linear mixed
> >>> « > effects model, and especially how it introducts correlations between
> >>> « > observations of a given group.  I think I finally got it, but could
> >>> « > you check that what I write below is correct and that I'm not missing
> >>> « > something?
> >>> « > 
> >>> « > I use a very simple case with only a single random effect, and no
> >>> « > fixed effects, because I guess that adding fixed effects or other
> >>> « > random effects does not change the idea, it "just" makes formulas more
> >>> « > complex.  I note i the random effect level, let's say « patient », and
> >>> « > j the observation for this patient.
> >>> « > 
> >>> « > In the linear model, we have Y(i,j) = µ0 + Z(i) + epsilon( i, j ) with
> >>> « > Z(i) and epsilon(i,j) randoms variables having a density of
> >>> « > probability, independant, and each iid.
> >>> « > 
> >>> « > Hence, cov( Y(i,j), Y(i,j') ) = Var( Z(i) ): the model introduces a
> >>> « > positive correlation between observations of the same patient.
> >>> « > 
> >>> « > 
> >>> « > 
> >>> « > In the Bernoulli model, Y(i,j) ~ B( pi(i,j) ) and pi(i,j) = f( Z(i) ),
> >>> « > f being the inverse link function, typically the reciprocal of the
> >>> « > logit. So we have
> >>> « > 
> >>> « > cov( Y(i,j), Y(i,j') ) = E( Y(i,j) Y(i, j') ) - E( Y(i,j) ) E( Y(i,j') )
> >>> « >     = Pr( Y(i,j) = 1 inter Y(i,j') = 1 ) - pi(i,j) * pi(i,j')
> >>> « > 
> >>> « > Since in practice pi(i,j) does not depend on j, pi(i,j) = pi(i,j').
> >>> « > 
> >>> « > Pr( Y(i,j) = 1 inter Y(i,j') = 1 ) =
> >>> « >  integral(R) Pr( Y(i,j) = 1 inter Y(i,j') = 1 | Z(i) = z ) p( Z(i) = z ) dz
> >>> « > 
> >>> « > Then, we assume that conditionnally on Zi, the Yij are independant, is
> >>> « > this right? This is the equivalent of « the epsilon(i, j) are
> >>> « > independant »? I assume this hypothesis is also used for computing the
> >>> « > likelihood? If not, what is the model for the joint probability?
> >>> « > 
> >>> « > In that case,
> >>> « > 
> >>> « > Pr( Y(i,j) = 1 inter Y(i,j') = 1 ) =
> >>> « >  integral(R) f(z) f(z) p( Z(i) = z ) dz
> >>> « > 
> >>> « > and since pi(i,j) = integral( R ) f(z) p( Z(i) = z ) dz we have
> >>> « > 
> >>> « > cov( Y(i,j), Y(i,j') ) =
> >>> « > integral( R ) f²(z) p( Z(i) = z ) dz -
> >>> « >  ( integral( R ) f(z) p( Z(i) = z ) dz )²
> >>> « > 
> >>> « > which in general has no reason to be nul, hence the two observations
> >>> « > are correlated. Is this correct?
> >>> « > 
> >>> « > Is there any way to have a closed-form of the covariance, for usual f
> >>> « > (let's say, logit or probit) and Z distribution (let's say, Gaussian)?
> >>> « > 
> >>> « > Thanks a lot for reading, and your answers,
> >>> « > 
> >>> « > -- 
> >>> « >                                Emmanuel CURIS
> >>> « >                                emmanuel.curis using parisdescartes.fr
> >>> « > 
> >>> « > Page WWW: http://emmanuel.curis.online.fr/index.html
> >>> « > 
> >>> « > _______________________________________________
> >>> « > R-sig-mixed-models using r-project.org mailing list
> >>> « > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>> « 
> >>> 
> >>> -- 
> >>>                              Emmanuel CURIS
> >>>                              emmanuel.curis using parisdescartes.fr
> >>> 
> >>> Page WWW: http://emmanuel.curis.online.fr/index.html
> >> 
> > 
> 

-- 
                                Emmanuel CURIS
                                emmanuel.curis using parisdescartes.fr

Page WWW: http://emmanuel.curis.online.fr/index.html



More information about the R-sig-mixed-models mailing list