[R-sig-ME] Mixed-model-binary logistic model with dependence between individual repeated measures

Sat Jan 8 00:19:19 CET 2011

On Fri, 7 Jan 2011, Martin Maechler wrote:

>>>>>> Ben Bolker <bbolker at gmail.com>
>>>>>>     on Fri, 07 Jan 2011 11:49:31 -0500 writes:
>
>    > -----BEGIN PGP SIGNED MESSAGE-----
>    > Hash: SHA1
>
>    > On 11-01-07 11:35 AM, Anna Ekman wrote:
>    >> Ben Bolker, thank you for your suggestions.
>    >>
>    >> Yes, it is suprising that I in SAS and STATA have to assume
>    >> independence between the measurements within an individual.
>
>    > It's fundamentally a bit hard to specify correlation among individuals
>    > in a non-normal model. One option is to go completely to the marginal
>    > specification (which you said you don't want to do); probably the most
>    > sensible statistical formulation is
>
>    > (fixed effects)  eta0 = X*beta
>    > (random effects) eta1 ~ MVN(mu=X*beta,Sigma=(something sensible such
>    > as AR(1) within individuals))
>    > y ~ Bernoulli(eta1)
>
> Interesting... {I've been "taught" in the past that  correlation
>                specification for non-normal, i.e. GLME models,
> 		would not make sense /  be possible,
> 		something you do not seem to support ...
> }
>
> Does the above mean {slight changes}
>
> (fixed effects)  eta0 = X*beta
> (random effects) eta1 ~ MVN(0, Sigma=(something sensible such
> 				       as AR(1) within individuals))
>  (Y | X,eta1)  ~ Bernoulli( logit(eta0 + eta1) )

With the probit link, such dichotomous and ordinal variable mixed models 
have a long history in genetics and psychometrics.  In the latter case, 
factor analysis and path analysis of tetrachoric/polychoric correlations 
is completely equivalent to the probit-normal, although GLS/WLS was often 
used for computational reasons.  We used to do all this in LISREL.  For 
the case of varying numbers of observations per individual (and other 
irregular data types), you can use the "multiple groups" approach, where 
you specify a covariance matrix of the right size for each pattern of 
data, and constrain the correlations equal in the different groups. 
Since the main interest is in the correlations between latent variables, 
all hypotheses and estimates are usually framed at that "level" of the 
model.

In the genetic situation, for example, we might estimate the heritability 
of a dichotomous trait based on family data under a polygenic model as 
being 1/2 the sibling tetrachoric correlation.  Model criticism is done by 
comparing predicted risk to different degrees of relations of an affected 
individual, or set of affected relatives.  Practically, this was used for 
genetic counselling etc.  In the current era of genome wide association 
studies, a key question is the "missing heritability", ie amount of 
familial aggregation of diseases unexplained by gene variants with 
detectable effect: the case control studies have N=30000.  Some of the 
arguments hinge on what kind of link function is used in the theoretical 
model.

Sorry, I couldn't resist ;)

-- 
| David Duffy (MBBS PhD)                                         ,-_|\
| email: davidD at qimr.edu.au  ph: INT+61+7+3362-0217 fax: -0101  /     *
| Epidemiology Unit, Queensland Institute of Medical Research   \_,-._/
| 300 Herston Rd, Brisbane, Queensland 4029, Australia  GPG 4D0B994A v