[R-sig-ME] afex in missing data (when testing main effects in presence of interactions in factorial designs)

Henrik Singmann singmann at psychologie.uzh.ch
Mon Jun 5 12:15:19 CEST 2017

Hi Thomas,

As afex is mentioned, let me weigh in with my thoughts (though I am as 
always happy for corrections).

First of all, afex implements basically the procedure described in the 
document by Roger Levy, when using the same type of tests (i.e., 
likelihood ratio tests). Using the second example in the document we get:

M <- 12
dat <- expand.grid(X=factor(c("x1","x2")),
dat$XY <- with(dat,factor(paste(X,Y)))
beta.XY <- matrix(c(1,4,5,2,3,3),2,3)
b.S <- rmvnorm(M,rep(0,6),diag(6)/100)
b.I <- rmvnorm(M,rep(0,6),diag(6)/100)
dat$Response <- with(dat,beta.XY[cbind(X,Y)] + b.S[cbind(subj,XY)] +
                        b.I[cbind(item,XY)] + rnorm(nrow(dat),sd=0.1))
Y.numeric <- sapply(dat$Y,function(i) contr.sum(3)[i,])
dat$Y1 <- Y.numeric[1,]
dat$Y2 <- Y.numeric[2,]
m0 <- lmer(Response ~ Y1 + X:Y1 + Y2 + X:Y2 + (XY|subj) + (XY|item),
            dat, REML = FALSE)
m1 <- lmer(Response ~ X*Y + (XY|subj) + (XY|item),dat, REML = FALSE)
# Data: dat
# Models:
# m0: Response ~ Y1 + X:Y1 + Y2 + X:Y2 + (XY | subj) + (XY | item)
# m1: Response ~ X * Y + (XY | subj) + (XY | item)
#    Df     AIC     BIC logLik deviance  Chisq Chi Df Pr(>Chisq)
# m0 48 -1040.6 -812.01 568.28  -1136.6
# m1 49 -1038.7 -805.35 568.33  -1136.7 0.0999      1      0.752

(mm1 <- mixed(Response ~ X*Y + (XY|subj) + (XY|item),dat,
               method = "LRT"))
# Mixed Model Anova Table (Type 3 tests, LRT-method)
# Model: Response ~ X * Y + (XY | subj) + (XY | item)
# Data: dat
# Df full model: 49
#   Effect df     Chisq p.value
# 1      X  1      0.10     .75
# 2      Y  2 66.06 ***  <.0001
# 3    X:Y  2 87.90 ***  <.0001
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘+’ 0.1 ‘ ’ 1

As can be seen, the test of x is identical with a chi-square value of 
0.0999 (rounded to 0.10) and a p-value of 0.75.

We can also see that the intercept correspond exactly to the grand mean:
# (Intercept)
#    3.006103

# [1] 3.006103

Your question is now what happens for unbalanced data. Type III tests 
(as implemented here and in the document) assume the imbalance is random 
and not structural (i.e., data is missing completely at random). This is 
achieved by weighting the cells equally and not the data points. In 
other words, we simply do the analysis assuming each cell (i.e., 
combination of factors) had the same size. As a consequence, the grand 
mean does not correspond to the grand mean of the data, but to the 
unweighted mean of the cells. (Note that this only holds approximately 
for mixed models, as the random effects also play a role here. For 
standard ANOVAs this is exactly true.)

An example can show this. We first remove one data point and then fit 
the data again (with a very reduced model as we are only interested in 
the intercept estimate here):

dat2 <- dat[-1,]

mm2 <- mixed(Response ~ X*Y + (1|subj),dat2, return="merMod")
# (Intercept)
#    3.006236

# [1] 3.008559

mean(tapply(dat2$Response, INDEX = list(dat2$XY), mean))
# [1] 3.006254

As can be seen, the intercept corresponds approximately to the 
unweighted grand mean of all cells, but not to the grand mean of the data.

As said above, the small difference between the intercept and the grand 
mean are a consequence of the random effects. Change the random effects 
structure to the one above and you will see even more differences. 
Furthermore, for standard ANOVA Type III tests exactly correspond to the 
unweighted grand mean

What this tells us is that Type III tests are only appropriate if the 
missing is not structural but random. In other words, the treatment 
(i.e., condition levels) should not be the cause for the missing data 
for Type III tests. Type III test correct for missing data by simply 
treating the data as having no missing data (i.e., all cells are weighed 

If the missing were structural (i.e., a consequence of the factor 
levels) this approach is of course not appropriate. Then we would want 
to use something like Type II tests or what you describe in which the 
group sizes are taken into account and the intercept correspond to the 
grand mean. For example, for observational data where group sizes are 
very often informative Type II tests seem more appropriate.

I expect this difference in how Type II and Type III tests deal with 
imbalance is also one of the reasons why those statisticans that work a 
lot with observational data prefer Type II over Type III tests whereas 
those that work with experimental data (where imbalance is usually small 
and random) prefer Type III tests over Type II test.

Hope that helps,

PS: It always helps to include some example data.

Am 04.06.2017 um 15:41 schrieb Thomas Deimel via R-sig-mixed-models:
> Hi everyone,
> I am using lmer (or mixed() from the afex package) to implement a linear mixed model of the form
> y~1+x1+x2+x1:x2+(by-subject random effects),
> where x1 and x2 are both categorical variables/factors. The design is balanced but there are missing values for some x2 levels in some of the subjects.
> I am interested in testing the main effect of say x1 in the presence of the interaction term x1:x2. In order to achieve this, I convert x2 to a numeric variable using sum contrasts - as summarized for example [here][1]. This gives the same results for inference (nearly identical p-values) as using mixed() from the afex package. This corresponds to a "type III test".
> The linked summary, however, mentions that the main effect we test essentially corresponds to the effect of x1 averaged over all levels of x2 (which I assume corresponds to x2=0 when converted to numeric and using true contrasts). But, there are missing data for y for some of the levels of x2 in some of the subjects, so how does this affect the averaging.
> My Qs: Given the above implementation of the model,...
> 1) ...do missing values lead to putting a stronger weight on x2 levels that do not have missing values when averaging over x2 levels to calculate the main effect of x1? OR is R aware of that distortion and calculates a weighted average of some sort (I don't see how)?
> 2) ...is this a problem? I assume it could lead to distorted estimation of main effect significance in smaller data sets or if there are a lot of missing values for certain levels of x2?
> 3) ...is there a way around it, like forcing R to weighting the average? Or would it be advisable to compare y~x2 to y~x1+x2, instead? I would certainly be inclined to use this if the interaction was insignificant (agreed?) but what if it is significant? Any other options?
> Thanks for the help,
> Thomas
>    [1]: https://arxiv.org/pdf/1405.2094.pdf "here"
> 	[[alternative HTML version deleted]]

More information about the R-sig-mixed-models mailing list