[R-sig-ME] Precision about the glmer model for Bernoulli variables

Tue Apr 21 11:19:21 CEST 2020

Thanks for all this details about correlation.

Just to be clear about the nomenclature: when a model is called
« A-B », A is for the cumulative distribution function used as link
between pi and the linear predictor including the random effect(s) or
its reciprocal, and B is for the distribution of random effect(s), is
this right?

If this is right, for the « double binomial », it would mean that the
link function is a step function, and not a monotonous strictly
increasing function. Since it would then not be a bijection between R
and [0,1], doesn't it introduce indetermination and strong constraints
on the possible values of pi? I guess I'm wrongly interpreting what
means « double binomial ».

Sorry for this naive questions, I'm discovering the field...

Best regards,

On Tue, Apr 21, 2020 at 07:59:48AM +0000, John Maindonald wrote:
> Actually, with a suitable parameterization, the betabinomial allows
> small negative correlations.  See
>   Binary Regression Using an Extended Beta-Binomial Distribution, With Discussion of
>  Correlation Induced by Covariate Measurement Errors
> R. L. Prentice: Journal of the American Statistical Association
> Vol. 81, No. 394 (Jun., 1986), pp. 321-327
> 
> A feature of the betabinomial (BB) , which ought to be better advertised than
> it is in most accounts that I have seen, is that, for positive correlation rho,
> it sets a strict lower limit of pi(1-pi)(1+rho) on the variance of the estimate
> of the proportion pi.  This is in marked contrast to the variance
> assumptions that define quasibinomial errors.
> 
> The gamlss package implements the double binomial (as well
> as the logistic normal and BB), but as far as I can tell, not in a
> multi-level model context.  The double binomial does allow a
> negative correlation.
> 
> 
> John Maindonald             email: john.maindonald using anu.edu.au<mailto:john.maindonald using anu.edu.au>
> 
> 
> On 21/04/2020, at 19:18, David Duffy <David.Duffy using qimrberghofer.edu.au<mailto:David.Duffy using qimrberghofer.edu.au>> wrote:
> 
> You'd know the beta-binomial constrains the range of the correlation coefficient positive, but the logit-normal and probit-normal allow a negative correlation, with a lower bound constrained by the N, in the case of regular sized clusters. For your example of 2x2 tables, the correlation can go from -1 to 1. This is the usual genetics
> type setup where you model the pairwise correlations for all pairs of observations in the dataset versus their
> (pairwise) genetic relatedness.  In multi-trait models, the correlations between different phenotypes (say asthma and diabetes) in the same individual can be negative, so you can estimate negative variance components.
> 
> The pedigreemm package extends lme4 to allow glmms for pedigree data (see the mastitis example for a binary trait).
> 
> Cheers, David Duffy.
> ________________________________________
> From: R-sig-mixed-models <r-sig-mixed-models-bounces using r-project.org<mailto:r-sig-mixed-models-bounces using r-project.org>> on behalf of Vaida, Florin <fvaida using health.ucsd.edu<mailto:fvaida using health.ucsd.edu>>
> Sent: Tuesday, 21 April 2020 7:05:40 AM
> To: Emmanuel Curis
> Cc: r-sig-mixed-models using r-project.org<mailto:r-sig-mixed-models using r-project.org>
> Subject: Re: [R-sig-ME]  Precision about the glmer model for Bernoulli variables
> 
> Hi Emmanuel,
> 
> That's a good question.  My guess is that the correlation is non-negative generally, but I wasn't able to prove it theoretically even in the simplest case when Y1, Y2 ~ Bernoulli(u) independent conditionally on u, and u ~ Normal(0, 1).  I am curious if someone has a solution.
> We can't go too far down this route in this forum, since Doug wants to keep it applied.
> 
> Florin
> 
> On Apr 20, 2020, at 12:32 PM, Emmanuel Curis <emmanuel.curis using parisdescartes.fr<mailto:emmanuel.curis using parisdescartes.fr>> wrote:
> 
> Hi Florin,
> 
> Thanks for the answer, the precision about p(i,j), and the reference.
> 
> A last question, that I forgot in my message: is the obtained
> correlation also always positive, as in the linear case? Or may some
> negative correlation appear, depending on the values of pi(i,j) and
> pi(i,j')?
> 
> Best regards,
> 
> On Mon, Apr 20, 2020 at 03:27:39PM +0000, Vaida, Florin wrote:
> « Hi Emmanuel,
> «
> « Your reasoning is correct.
> «
> « As a quibble, outside a simple repeated measures experiment setup, p(i,j) *does* depend on j.
> « For example, if observations are collected over time, generally there is a time effect; if they are repeated measures with different experimental conditions, p(i,j) will depend on the condition j, etc.
> «
> « There is almost certainly no closed form solution for the covariance under logit.
> « I am not sure about the probit (my guess is not).
> « There will be some Laplace approximations available, a la Breslow and Clayton 1993.
> «
> « I'd be curious if these formulas/approximations were developed somewhere - I'd be surprised if they weren't.
> «
> « Florin
> «
> «
> « > On Apr 20, 2020, at 12:48 AM, Emmanuel Curis <emmanuel.curis using parisdescartes.fr<mailto:emmanuel.curis using parisdescartes.fr>> wrote:
> « >
> « > Hello everyone,
> « >
> « > I hope you're all going fine in these difficult times.
> « >
> « > I tried to understand in details the exact model used when using glmer
> « > for a Bernoulli experiment, by comparison with the linear mixed
> « > effects model, and especially how it introducts correlations between
> « > observations of a given group.  I think I finally got it, but could
> « > you check that what I write below is correct and that I'm not missing
> « > something?
> « >
> « > I use a very simple case with only a single random effect, and no
> « > fixed effects, because I guess that adding fixed effects or other
> « > random effects does not change the idea, it "just" makes formulas more
> « > complex.  I note i the random effect level, let's say « patient », and
> « > j the observation for this patient.
> « >
> « > In the linear model, we have Y(i,j) = µ0 + Z(i) + epsilon( i, j ) with
> « > Z(i) and epsilon(i,j) randoms variables having a density of
> « > probability, independant, and each iid.
> « >
> « > Hence, cov( Y(i,j), Y(i,j') ) = Var( Z(i) ): the model introduces a
> « > positive correlation between observations of the same patient.
> « >
> « >
> « >
> « > In the Bernoulli model, Y(i,j) ~ B( pi(i,j) ) and pi(i,j) = f( Z(i) ),
> « > f being the inverse link function, typically the reciprocal of the
> « > logit. So we have
> « >
> « > cov( Y(i,j), Y(i,j') ) = E( Y(i,j) Y(i, j') ) - E( Y(i,j) ) E( Y(i,j') )
> « >     = Pr( Y(i,j) = 1 inter Y(i,j') = 1 ) - pi(i,j) * pi(i,j')
> « >
> « > Since in practice pi(i,j) does not depend on j, pi(i,j) = pi(i,j').
> « >
> « > Pr( Y(i,j) = 1 inter Y(i,j') = 1 ) =
> « >  integral(R) Pr( Y(i,j) = 1 inter Y(i,j') = 1 | Z(i) = z ) p( Z(i) = z ) dz
> « >
> « > Then, we assume that conditionnally on Zi, the Yij are independant, is
> « > this right? This is the equivalent of « the epsilon(i, j) are
> « > independant »? I assume this hypothesis is also used for computing the
> « > likelihood? If not, what is the model for the joint probability?
> « >
> « > In that case,
> « >
> « > Pr( Y(i,j) = 1 inter Y(i,j') = 1 ) =
> « >  integral(R) f(z) f(z) p( Z(i) = z ) dz
> « >
> « > and since pi(i,j) = integral( R ) f(z) p( Z(i) = z ) dz we have
> « >
> « > cov( Y(i,j), Y(i,j') ) =
> « > integral( R ) f²(z) p( Z(i) = z ) dz -
> « >  ( integral( R ) f(z) p( Z(i) = z ) dz )²
> « >
> « > which in general has no reason to be nul, hence the two observations
> « > are correlated. Is this correct?
> « >
> « > Is there any way to have a closed-form of the covariance, for usual f
> « > (let's say, logit or probit) and Z distribution (let's say, Gaussian)?
> « >
> « > Thanks a lot for reading, and your answers,
> « >
> « > --
> « >                                Emmanuel CURIS
> « >                                emmanuel.curis using parisdescartes.fr<mailto:emmanuel.curis using parisdescartes.fr>
> « >
> « > Page WWW: http://emmanuel.curis.online.fr/index.html
> « >
> « > _______________________________________________
> « > R-sig-mixed-models using r-project.org<mailto:R-sig-mixed-models using r-project.org> mailing list
> « > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> «
> 
> --
>                               Emmanuel CURIS
>                               emmanuel.curis using parisdescartes.fr<mailto:emmanuel.curis using parisdescartes.fr>
> 
> Page WWW: http://emmanuel.curis.online.fr/index.html
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org<mailto:R-sig-mixed-models using r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> [EXTERNAL EMAIL] This message originates from an external email address, please exercise caution when clicking any links or opening attachments. If you believe the sender is impersonating someone at QIMR Berghofer, please forward this message to phishing using qimrberghofer.edu.au.
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 

-- 
                                Emmanuel CURIS
                                emmanuel.curis using parisdescartes.fr

Page WWW: http://emmanuel.curis.online.fr/index.html