[R-sig-ME] marginality principle for mixed effects models

Emmanuel Charpentier charpent at bacbuc.dyndns.org
Mon Apr 6 20:30:05 CEST 2009

Le vendredi 03 avril 2009 à 19:17 -0700, Steven McKinney a écrit :
> Dear list,
> I'm working on mapping all my old-school fixed effects linear
> modeling teachings into the fabulous new(er) mixed effects world.
> I've been looking at the "Oats" data used in the nlme library.
> Here's a set of commands that build successively richer models,
> and tests to evaluate differences due to variety.
> library("nlme")
> library("lme4")
> plot(Oats, inner = TRUE)
> print(om1 <- lmer(yield ~ nitro + (1 | Block), data = Oats))
> print(om2 <- lmer(yield ~ nitro + Variety + (1 | Block), data = Oats))
> print(om3 <- lmer(yield ~ nitro + (1 | Block/Variety), data = Oats))
> print(om4 <- lmer(yield ~ nitro + Variety + (1 | Block/Variety), data = Oats))
> anova(om4, om3) ## Does this violate the principle of marginality? (Testing main effect in presence of interaction?)
> anova(om4, om2) ## Would this be the first test of variety?
> anova(om4, om1) ## Would this not be the omnibus test for variety?
> anova(om2, om1)
> If I wanted to assess the effect of variety, it seems to me
> the first test I'd want to look at is model om4 versus model om2.
> This it seems to me would be in the spirit of testing an interaction
> term with both main effects in the model in the old fixed effects
> world.
> An omnibus test of the importance of variety would seem to me
> to be model om4 versus om1. 
> Testing om4 versus om3 seems to me to violate the marginality
> principle (testing a main effect in the presence of an interaction
> involving that main effect).  Or is there something different in
> the mixed effects world - does this marginality principle hold
> for this scenario?  The plots and all the other tests seem to
> strongly suggest that there are important differences due to variety,
> but this test suggests otherwise.  This test does not seem 
> appropriate.  
> Any comments?  Is this paradigm mapping reasonable?

[ Big brutal SNIP of computations... ]

Since it seems that you've got no "authorized" answers, here are two
(€)cents from a mere practitioner :

When you consider an effect as fixed, you are indeed stating that each
of it's levels has a specific meaning and can be reproduced. Exhibiting
an interaction on a dependant variable Y between two fixed effects A and
B means that there is no common effect of, for example, A, and that the
effect of A is dependant of B in a way that cannot be "explained away"
as variability (that's exactly what you do when you reject the null
hypothesis H0 of the absence of interaction between A and B : you
consider that the  probability of <your results or worse> under H0 is
too small to accept the conjunction of your results and H0 ; since you
do not want to reject your results, you reject the other term of the
conjunction, i. e. H0). In consequence, making a prediction on the
effect of A on Y will involve A and B ; however it's variability will be
a function of the variance of Y *for a given value of A and B*.

Demonstrating an interaction between a fixed effect A and a random
effect B means something entirely different : here, you state that B is
a source of randomness that has to be taken into account in predicting
the effect of A. But B is a random effect, that is something (that you
consider) irreproductible. The consequences are different : instead of
saying "my prediction an A's effect cannot be made without accounting
for B", you accept that B will decrease the precision of your prediction
of the effect of A on Y : it will involve the full variability of Y
according to B.

Let's take a (not so imaginary) example, simpler than the "Oats" example
(to which we'll get back eventually) : trying to assess the effect of a
care organization (factor A, that we'll take to have two levels) on the
results Y of treatments given to patients attending various hospitals of
the same hospital group (factor B). To answer this question, a
reasonable way is to run a randomized clinical trial comparing Ys on
patients receiving care under A1 or A2. Let's assume, for sake of
simplicity, that care organization can be choosen at will for each
patient (e. g. an imaging technique for a given clinical question), thus
allowing for simple allocation scheme. Your judgement criterium will be
(say) Y|A2-Y|A1 i. e. differences of means of Y in patients having
received care under A1 and A2 conditions.

To assess this difference of means in the best possible way, you will
adjust on B (center effect) and check the homogeneité of the effect of A
between various Bs (hospitals).

If your point of view is the hospital's group manager, your problem is
to determine how to equip your various centers (a known collection).
Your analysis might be better done considering B as a fixed effect, and
you will assess anova(lm(Y~A*B, data=yourdata)). if the A*B effect "is
significant" (i. e. cannot be explained away as mere natural
variability), your answer will be that there is no "good" universal
solution to your problem, and you will have to compare for each center,
with something along the lines of by(yourdata, yourdata$B,

However, if your point of view is the one of a physician trying to
assess a technique medical value, you might consider your hospital group
as a (pseudo-)random sample of all possible hospitals. Your problem is
to know if, *accounting for inter-centre variability*, there exist a
systematic difference between A1 and A2 nonwhistanding B
"value" (irreproductible by definition). You might try to assess this by
building lmer(Y~A+(A|B), data=yourdata, REML=FALSE) and assessing A2

Having a "significant" A:B interaction means that the A effect exhibits
some nonnegligible variation according to B, but the whole point of the
mixed models exercice is to assess A *against the various sources of

Technical Notes : 1) if you want to make comparisons between models with
differing fixed effects structure, you should use maximum likelihood
maximization, not REML (see P&B chap 2) ; be aware, however, that this
technique might "bias" your estimates more than REML (for the exact
meaning of bias, the importance of these biases, the ways to reduce
them, I defer to thows who know. Mr Bates, would you mind ?)
2) Be aware that Douglas Bates has expressed serious doubts on the
validity of "classical" ways to compare models (e. g. LRT tests, with
"obvious" dof...). This subject has already suscited flows of
ink^Kelectrons on this list, and I don't consider myself competent
enough to express an opinion.
3) This explains why, in older times, people confronted with
interactions of fixed factors were tempted to rechristen one of them as
random and to test the other by comparison of its effect(s) not with the
residual variance but with the interaction variance. That's a "trick" I
learned an (uncomfortably) long time ago, which aims to answer an
ill-formulated question along the lines of "is there a an A effect
clearly (statistically) larger that the variations introduced by B to
this possible A effect ?". Which might be a valid question, but
different of the question originally answered to by the ANOVA...

Now to get back to "Oats". Here, the difficulty is well explained in
both V&R4 and P&B : The experimental plan is a split-plot. Having
exactly 3 plots and 3 varieties per block makes "Variety" in
"Block/Variety" *also* a synonym for "Plot %in% Block", which is a
random effect : there is absolutely no reason to think that the
properties of Plot 2 in Block IV has anything in common with Plot 2 in
Block II, for example. Both "Variety %in% Block" and "Plot %in% Block"
ate *arbitrary* labels.

Therefore, the same ("Variety") variable designates 1) a fixed effect
(reproductible) and 2) a source of variability (irreproductible). The
optimization technique used in lme(r) aims to partition the total
variability between varieties between the "Variety" (fixed) effect and
the "PlotDisguisedAsVariety" (random) effect. R. A. Fisher has proposed
in 1935 a way to do this oin the case of *balanced* data sets (designed
(and well-executed) experiments), implemented in aov(). Here, P&B
chapter 1 discuss this very data set (it's the last introductory
example) ; the discussion in V&R4 is skimpier but gives another "look"a
at this problem. Both are worth reading, and I won't paraphrase ...

I might be interested to any corrections or suggestions about this way
to explain random effects to "laymen" (usually physicians who don't give
a hoot about those technicalities) : that's something I have to explain
more and more often.


					Emmanuel Charpentier

More information about the R-sig-mixed-models mailing list