[R-sig-ME] In simple terms, how is the estimated variance of higher-level effects calculated?

Tue Jul 17 19:13:48 CEST 2012

Thanks for the feedback, David.  Owing to my lack of expertise, I confess that I didn't completely follow everything, but your email inspired me to explore the varying intercepts with different exponential families, focusing particularly on the gaussian.

In short, my students were looking at the estimated varying intercepts for each higher-level group (or the "BLUP's", as some people seem to call them) -- the intercepts that one can see by entering:

ranef (fm1)

Their suggestion was that, if one calculates the variance of that vector of estimated intercepts, then one should get the number that gets reported by lme4 for the higher-level variance (for example, 0.93537 in my earlier email).

We did this for several two-level models, and although the variance we calculate from the group-specific intercepts is always in the same ballpark as the lme4 output, it's never on target.

So I've been scouring all the resources I have at hand: Gelman and Hill, Ben Bolker's book, Snijders and Bosker, etc.

Most of these seem to say: "And this is the variance estimate and how to interpret it" . . . without going into non-technical details about how exactly it's calculated.  That's where we're stuck.  I know that the intercepts are calculated via partial pooling and a shrinkage factor, but it's not clear how that relates to the estimated variance.

----- Original Message -----
From: David Duffy <David.Duffy at qimr.edu.au>
To: Jeremy Koster <helixed2 at yahoo.com>
Cc: r-sig-mixed-models at r-project.org
Sent: Monday, July 16, 2012 6:43 PM
Subject: Re: [R-sig-ME] In simple terms,how is the estimated variance of higher-level effects calculated?

On Mon, 16 Jul 2012, Jeremy Koster wrote:

> I'm teaching some grad students about mixed-effects modeling. To their credit, they're paying close attention and asking good questions.
> 
> Today, we were talking about variance components in a basic two-level binomial glmer with no fixed effects.
[...]
> So if one were to describe in simple terms how lme4 generates a number for the estimated variance of the random effects, what might be said?

I think conceptualizing it as a latent variable model helps.  Since the latent variables are unobserved, we make inferences about their distribution based upon the distribution of the manifest variables and our assumptions about the nature of the latent variable distribution.

Different assumed latent variable distributions eg beta, normal, mixtures - and different link functions eg logit, probit, log, identity - will change not only your variance estimates, but your interpretation.

One useful exercise might be to simulate binary data from a threshold model, and demonstrate how it is that the variances of the (known) latent variables are estimated (in a probit-normal model), and how the tetrachoric correlation, Pearson correlation and odds ratio for a 2x2 table vary by marginal probabilities and association strength.

You might also compare different models for this "classic"
boric acid teratogenicity dataset:

http://genepi.qimr.edu.au/staff/davidD/Sib-pair/Documents/Using_Sib-pair/Scripts/boricex.in

A final example might be to look at the commonly used approach of fitting a LMM to binary data coded as 1's and 0's (going back to Cochrane 1943), and whether results are deceptive or not.  In analysis of Genome Wide Association Scan data for a binary phenotype Y, we test the (fixed) effect of each measured polymorphism X (usually scored as 0,1,2) against Y, but we need to adjust for confounding due to unobserved relatedness of individuals in the study. The latter is estimated as an NxN empirical kinship matrix (the average pairwise correlation over M polymorphisms between N study participants, with M=2000000 to 5000000, and N = 1000 to 100000).  When Y is continuous, a LMM is a very attractive approach...

-- | David Duffy (MBBS PhD)                                         ,-_|\
| email: davidD at qimr.edu.au  ph: INT+61+7+3362-0217 fax: -0101  /     *
| Epidemiology Unit, Queensland Institute of Medical Research   \_,-._/
| 300 Herston Rd, Brisbane, Queensland 4029, Australia  GPG 4D0B994A v