[R-sig-ME] In simple terms, how is the estimated variance of higher-level effects calculated?

Tue Jul 17 00:43:03 CEST 2012

On Mon, 16 Jul 2012, Jeremy Koster wrote:

> I'm teaching some grad students about mixed-effects modeling. To their 
> credit, they're paying close attention and asking good questions.
>
> Today, we were talking about variance components in a basic two-level 
> binomial glmer with no fixed effects.
[...]
> So if one were to describe in simple terms how lme4 generates a number 
> for the estimated variance of the random effects, what might be said?

I think conceptualizing it as a latent variable model helps.  Since the 
latent variables are unobserved, we make inferences about their 
distribution based upon the distribution of the manifest variables and our 
assumptions about the nature of the latent variable distribution.

Different assumed latent variable distributions eg beta, normal, mixtures 
- and different link functions eg logit, probit, log, identity - will 
change not only your variance estimates, but your interpretation.

One useful exercise might be to simulate binary data from a threshold 
model, and demonstrate how it is that the variances of the (known) latent 
variables are estimated (in a probit-normal model), and how the 
tetrachoric correlation, Pearson correlation and odds ratio for a 2x2 
table vary by marginal probabilities and association strength.

You might also compare different models for this "classic"
boric acid teratogenicity dataset:

http://genepi.qimr.edu.au/staff/davidD/Sib-pair/Documents/Using_Sib-pair/Scripts/boricex.in

A final example might be to look at the commonly used approach of fitting 
a LMM to binary data coded as 1's and 0's (going back to Cochrane 1943), 
and whether results are deceptive or not.  In analysis of Genome Wide 
Association Scan data for a binary phenotype Y, we test the (fixed) effect 
of each measured polymorphism X (usually scored as 0,1,2) against Y, but 
we need to adjust for confounding due to unobserved relatedness of 
individuals in the study. The latter is estimated as an NxN empirical 
kinship matrix (the average pairwise correlation over M polymorphisms 
between N study participants, with M=2000000 to 5000000, and N = 1000 to 
100000).  When Y is continuous, a LMM is a very attractive approach...

-- 
| David Duffy (MBBS PhD)                                         ,-_|\
| email: davidD at qimr.edu.au  ph: INT+61+7+3362-0217 fax: -0101  /     *
| Epidemiology Unit, Queensland Institute of Medical Research   \_,-._/
| 300 Herston Rd, Brisbane, Queensland 4029, Australia  GPG 4D0B994A v