[R-sig-ME] Calculation of random effects for factors in R
pauljohn32 at gmail.com
Sat Aug 1 16:16:15 CEST 2015
On Tue, Jul 28, 2015 at 9:01 AM, Sudell, Maria [mesudell]
<M.E.Sudell at liverpool.ac.uk> wrote:
> I have a question concerning exactly how random effects for a factor are calculated in R. I have tried to find an answer on various R related websites and text books but cannot find a definitive explanation.
> As an example, if you had a longitudinal dataset, and you wanted to include an individual specific random effect for a smoking factor (say 3 levels, current, ex, never), how would the random effects be calculated using R? (I understand how to code this in R, I am aiming to understand the mechanics of how the function gets to the random effects).
I'm putting together class notes on this, but they are not quite
ready, or else I would give them to you.
The Pinheiro & Bates book (2000) is the classic statement on this.
There is a newer article that the lme4 team prepared for JSS will
answer this for you. Those are technically demanding. I have found
there are easier-to read interpretations of this in the Gelman & Hill
2007 book and in Ben Bolker's book Ecological Models and Data in R.
The approach is penalized maximum likelihood, in which the random
effects are conceptualized as coefficients on a random effects design
matrix. I did not realize how difficult this was to explain until I
tried with some students. If you bang your head on a few of these
books for a while, get the 2006 book by Simon Wood on generalized
additive models. On the way to GAMs, he's got about the most beautiful
explanation of how these models are estimated that you will ever find.
That's technically challenging, but I've never seen the structure
laid out so beautifully.
> My understanding so far would be that indicator variables for each of the levels of the factor would be included (in this case 3 indicator variables of 0,1, one for each of current, ex, never). Then coefficients for the indicator variables would be found (so for each individual in the dataset, we would end up with a coefficient for one of the indicator variables, assuming that individuals can't be in more than one group). These random coefficients (one for each individual as each individual would only fall into one smoking status) would then have their mean and variation calculated, in order to report the distribution of the random effect. Is this correct?
Not exactly. The estimate of the variance of the random effect is a
parameter estimate, and so far as I can tell, it is not ever linked or
even compared against the estimates of the individual case random
effects. That's an interesting question, though. Until you asked, I
had not thought much about it. I've never run ranef() to get the
individual random effect estimates and calculated their variance.
Theoretically, we know the estimated random effects are a blend of the
estimates you would get if you treated each subgroup in isolation and
the estimate you get if you pool all of the data. And the sample size
within each group determines how much weight is placed on the
Since those estimates of the b's are shrunken in that way, their
variance won't necessarily coincide with the variance number at the
top of the lmer output.
Anyway, I've been reading the papers by Doug Bates and now the larger
lme4 team and they explain all this thoroughly.
> Apologies for such a simple question. Any help or explanation (or point to relevant paper or textbook) of how random effects are calculated for factors in R would be greatly appreciated.
> Many thanks
> [[alternative HTML version deleted]]
> R-sig-mixed-models at r-project.org mailing list
Paul E. Johnson
Professor, Political Science Director
1541 Lilac Lane, Room 504 Center for Research Methods
University of Kansas University of Kansas
More information about the R-sig-mixed-models