# [R-sig-ME] In simple terms, how is the estimated variance of higher-level effects calculated?

David Duffy David.Duffy at qimr.edu.au
Tue Jul 17 00:43:03 CEST 2012

```On Mon, 16 Jul 2012, Jeremy Koster wrote:

> I'm teaching some grad students about mixed-effects modeling. To their
> credit, they're paying close attention and asking good questions.
>
> Today, we were talking about variance components in a basic two-level
> binomial glmer with no fixed effects.
[...]
> So if one were to describe in simple terms how lme4 generates a number
> for the estimated variance of the random effects, what might be said?

I think conceptualizing it as a latent variable model helps.  Since the
latent variables are unobserved, we make inferences about their
distribution based upon the distribution of the manifest variables and our
assumptions about the nature of the latent variable distribution.

Different assumed latent variable distributions eg beta, normal, mixtures
- and different link functions eg logit, probit, log, identity - will
change not only your variance estimates, but your interpretation.

One useful exercise might be to simulate binary data from a threshold
model, and demonstrate how it is that the variances of the (known) latent
variables are estimated (in a probit-normal model), and how the
tetrachoric correlation, Pearson correlation and odds ratio for a 2x2
table vary by marginal probabilities and association strength.

You might also compare different models for this "classic"
boric acid teratogenicity dataset:

http://genepi.qimr.edu.au/staff/davidD/Sib-pair/Documents/Using_Sib-pair/Scripts/boricex.in

A final example might be to look at the commonly used approach of fitting
a LMM to binary data coded as 1's and 0's (going back to Cochrane 1943),
and whether results are deceptive or not.  In analysis of Genome Wide
Association Scan data for a binary phenotype Y, we test the (fixed) effect
of each measured polymorphism X (usually scored as 0,1,2) against Y, but
we need to adjust for confounding due to unobserved relatedness of
individuals in the study. The latter is estimated as an NxN empirical
kinship matrix (the average pairwise correlation over M polymorphisms
between N study participants, with M=2000000 to 5000000, and N = 1000 to
100000).  When Y is continuous, a LMM is a very attractive approach...

--
| David Duffy (MBBS PhD)                                         ,-_|\
| email: davidD at qimr.edu.au  ph: INT+61+7+3362-0217 fax: -0101  /     *
| Epidemiology Unit, Queensland Institute of Medical Research   \_,-._/
| 300 Herston Rd, Brisbane, Queensland 4029, Australia  GPG 4D0B994A v

```