[R-sig-ME] Precision about the glmer model for Bernoulli variables
Vaida, Florin
|v@|d@ @end|ng |rom he@|th@uc@d@edu
Tue Apr 21 19:25:33 CEST 2020
Hi Emmanuel,
So I can prove positive within-subject correlation for GLME logistic regression with random intercepts - assuming all observations have same mean!
Let Yj ~ Bernoulli(mu), logit(mu) = beta + u, u ~ Normal(0, tau^2).
Using the conditional covariance formula you get
Cov(Y1, Y2) = E(Cov(Y1,Y2 | u) + Cov(E(Y1|u), E(Y2|u)) = 0 + Cov( mu(u), mu(u)) = Var(mu(u)) >= 0, with 0 only if tau^2 = 0.
This proof does not extend if you let Yj have different means, i.e., replace beta by beta_j.
It also does not apply to more general random effects structures, e.g. random intercepts and slopes.
Note however that for the *linear* model with random intercepts and slopes, the correlation is not guaranteed positive.
Florin
> On Apr 20, 2020, at 2:05 PM, Vaida, Florin <fvaida using health.ucsd.edu> wrote:
>
> Hi Emmanuel,
>
> That's a good question. My guess is that the correlation is non-negative generally, but I wasn't able to prove it theoretically even in the simplest case when Y1, Y2 ~ Bernoulli(u) independent conditionally on u, and u ~ Normal(0, 1). I am curious if someone has a solution.
> We can't go too far down this route in this forum, since Doug wants to keep it applied.
>
> Florin
>
>> On Apr 20, 2020, at 12:32 PM, Emmanuel Curis <emmanuel.curis using parisdescartes.fr> wrote:
>>
>> Hi Florin,
>>
>> Thanks for the answer, the precision about p(i,j), and the reference.
>>
>> A last question, that I forgot in my message: is the obtained
>> correlation also always positive, as in the linear case? Or may some
>> negative correlation appear, depending on the values of pi(i,j) and
>> pi(i,j')?
>>
>> Best regards,
>>
>> On Mon, Apr 20, 2020 at 03:27:39PM +0000, Vaida, Florin wrote:
>> « Hi Emmanuel,
>> «
>> « Your reasoning is correct.
>> «
>> « As a quibble, outside a simple repeated measures experiment setup, p(i,j) *does* depend on j.
>> « For example, if observations are collected over time, generally there is a time effect; if they are repeated measures with different experimental conditions, p(i,j) will depend on the condition j, etc.
>> «
>> « There is almost certainly no closed form solution for the covariance under logit.
>> « I am not sure about the probit (my guess is not).
>> « There will be some Laplace approximations available, a la Breslow and Clayton 1993.
>> «
>> « I'd be curious if these formulas/approximations were developed somewhere - I'd be surprised if they weren't.
>> «
>> « Florin
>> «
>> «
>> « > On Apr 20, 2020, at 12:48 AM, Emmanuel Curis <emmanuel.curis using parisdescartes.fr> wrote:
>> « >
>> « > Hello everyone,
>> « >
>> « > I hope you're all going fine in these difficult times.
>> « >
>> « > I tried to understand in details the exact model used when using glmer
>> « > for a Bernoulli experiment, by comparison with the linear mixed
>> « > effects model, and especially how it introducts correlations between
>> « > observations of a given group. I think I finally got it, but could
>> « > you check that what I write below is correct and that I'm not missing
>> « > something?
>> « >
>> « > I use a very simple case with only a single random effect, and no
>> « > fixed effects, because I guess that adding fixed effects or other
>> « > random effects does not change the idea, it "just" makes formulas more
>> « > complex. I note i the random effect level, let's say « patient », and
>> « > j the observation for this patient.
>> « >
>> « > In the linear model, we have Y(i,j) = µ0 + Z(i) + epsilon( i, j ) with
>> « > Z(i) and epsilon(i,j) randoms variables having a density of
>> « > probability, independant, and each iid.
>> « >
>> « > Hence, cov( Y(i,j), Y(i,j') ) = Var( Z(i) ): the model introduces a
>> « > positive correlation between observations of the same patient.
>> « >
>> « >
>> « >
>> « > In the Bernoulli model, Y(i,j) ~ B( pi(i,j) ) and pi(i,j) = f( Z(i) ),
>> « > f being the inverse link function, typically the reciprocal of the
>> « > logit. So we have
>> « >
>> « > cov( Y(i,j), Y(i,j') ) = E( Y(i,j) Y(i, j') ) - E( Y(i,j) ) E( Y(i,j') )
>> « > = Pr( Y(i,j) = 1 inter Y(i,j') = 1 ) - pi(i,j) * pi(i,j')
>> « >
>> « > Since in practice pi(i,j) does not depend on j, pi(i,j) = pi(i,j').
>> « >
>> « > Pr( Y(i,j) = 1 inter Y(i,j') = 1 ) =
>> « > integral(R) Pr( Y(i,j) = 1 inter Y(i,j') = 1 | Z(i) = z ) p( Z(i) = z ) dz
>> « >
>> « > Then, we assume that conditionnally on Zi, the Yij are independant, is
>> « > this right? This is the equivalent of « the epsilon(i, j) are
>> « > independant »? I assume this hypothesis is also used for computing the
>> « > likelihood? If not, what is the model for the joint probability?
>> « >
>> « > In that case,
>> « >
>> « > Pr( Y(i,j) = 1 inter Y(i,j') = 1 ) =
>> « > integral(R) f(z) f(z) p( Z(i) = z ) dz
>> « >
>> « > and since pi(i,j) = integral( R ) f(z) p( Z(i) = z ) dz we have
>> « >
>> « > cov( Y(i,j), Y(i,j') ) =
>> « > integral( R ) f²(z) p( Z(i) = z ) dz -
>> « > ( integral( R ) f(z) p( Z(i) = z ) dz )²
>> « >
>> « > which in general has no reason to be nul, hence the two observations
>> « > are correlated. Is this correct?
>> « >
>> « > Is there any way to have a closed-form of the covariance, for usual f
>> « > (let's say, logit or probit) and Z distribution (let's say, Gaussian)?
>> « >
>> « > Thanks a lot for reading, and your answers,
>> « >
>> « > --
>> « > Emmanuel CURIS
>> « > emmanuel.curis using parisdescartes.fr
>> « >
>> « > Page WWW: http://emmanuel.curis.online.fr/index.html
>> « >
>> « > _______________________________________________
>> « > R-sig-mixed-models using r-project.org mailing list
>> « > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> «
>>
>> --
>> Emmanuel CURIS
>> emmanuel.curis using parisdescartes.fr
>>
>> Page WWW: http://emmanuel.curis.online.fr/index.html
>
More information about the R-sig-mixed-models
mailing list