[R-sig-ME] afex in missing data (when testing main effects in presence of interactions in factorial designs)
singmann at psychologie.uzh.ch
Mon Jun 5 12:15:19 CEST 2017
As afex is mentioned, let me weigh in with my thoughts (though I am as
always happy for corrections).
First of all, afex implements basically the procedure described in the
document by Roger Levy, when using the same type of tests (i.e.,
likelihood ratio tests). Using the second example in the document we get:
M <- 12
dat <- expand.grid(X=factor(c("x1","x2")),
dat$XY <- with(dat,factor(paste(X,Y)))
beta.XY <- matrix(c(1,4,5,2,3,3),2,3)
b.S <- rmvnorm(M,rep(0,6),diag(6)/100)
b.I <- rmvnorm(M,rep(0,6),diag(6)/100)
dat$Response <- with(dat,beta.XY[cbind(X,Y)] + b.S[cbind(subj,XY)] +
b.I[cbind(item,XY)] + rnorm(nrow(dat),sd=0.1))
Y.numeric <- sapply(dat$Y,function(i) contr.sum(3)[i,])
dat$Y1 <- Y.numeric[1,]
dat$Y2 <- Y.numeric[2,]
m0 <- lmer(Response ~ Y1 + X:Y1 + Y2 + X:Y2 + (XY|subj) + (XY|item),
dat, REML = FALSE)
m1 <- lmer(Response ~ X*Y + (XY|subj) + (XY|item),dat, REML = FALSE)
# Data: dat
# m0: Response ~ Y1 + X:Y1 + Y2 + X:Y2 + (XY | subj) + (XY | item)
# m1: Response ~ X * Y + (XY | subj) + (XY | item)
# Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
# m0 48 -1040.6 -812.01 568.28 -1136.6
# m1 49 -1038.7 -805.35 568.33 -1136.7 0.0999 1 0.752
(mm1 <- mixed(Response ~ X*Y + (XY|subj) + (XY|item),dat,
method = "LRT"))
# Mixed Model Anova Table (Type 3 tests, LRT-method)
# Model: Response ~ X * Y + (XY | subj) + (XY | item)
# Data: dat
# Df full model: 49
# Effect df Chisq p.value
# 1 X 1 0.10 .75
# 2 Y 2 66.06 *** <.0001
# 3 X:Y 2 87.90 *** <.0001
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘+’ 0.1 ‘ ’ 1
As can be seen, the test of x is identical with a chi-square value of
0.0999 (rounded to 0.10) and a p-value of 0.75.
We can also see that the intercept correspond exactly to the grand mean:
#  3.006103
Your question is now what happens for unbalanced data. Type III tests
(as implemented here and in the document) assume the imbalance is random
and not structural (i.e., data is missing completely at random). This is
achieved by weighting the cells equally and not the data points. In
other words, we simply do the analysis assuming each cell (i.e.,
combination of factors) had the same size. As a consequence, the grand
mean does not correspond to the grand mean of the data, but to the
unweighted mean of the cells. (Note that this only holds approximately
for mixed models, as the random effects also play a role here. For
standard ANOVAs this is exactly true.)
An example can show this. We first remove one data point and then fit
the data again (with a very reduced model as we are only interested in
the intercept estimate here):
dat2 <- dat[-1,]
mm2 <- mixed(Response ~ X*Y + (1|subj),dat2, return="merMod")
#  3.008559
mean(tapply(dat2$Response, INDEX = list(dat2$XY), mean))
#  3.006254
As can be seen, the intercept corresponds approximately to the
unweighted grand mean of all cells, but not to the grand mean of the data.
As said above, the small difference between the intercept and the grand
mean are a consequence of the random effects. Change the random effects
structure to the one above and you will see even more differences.
Furthermore, for standard ANOVA Type III tests exactly correspond to the
unweighted grand mean
What this tells us is that Type III tests are only appropriate if the
missing is not structural but random. In other words, the treatment
(i.e., condition levels) should not be the cause for the missing data
for Type III tests. Type III test correct for missing data by simply
treating the data as having no missing data (i.e., all cells are weighed
If the missing were structural (i.e., a consequence of the factor
levels) this approach is of course not appropriate. Then we would want
to use something like Type II tests or what you describe in which the
group sizes are taken into account and the intercept correspond to the
grand mean. For example, for observational data where group sizes are
very often informative Type II tests seem more appropriate.
I expect this difference in how Type II and Type III tests deal with
imbalance is also one of the reasons why those statisticans that work a
lot with observational data prefer Type II over Type III tests whereas
those that work with experimental data (where imbalance is usually small
and random) prefer Type III tests over Type II test.
Hope that helps,
PS: It always helps to include some example data.
Am 04.06.2017 um 15:41 schrieb Thomas Deimel via R-sig-mixed-models:
> Hi everyone,
> I am using lmer (or mixed() from the afex package) to implement a linear mixed model of the form
> y~1+x1+x2+x1:x2+(by-subject random effects),
> where x1 and x2 are both categorical variables/factors. The design is balanced but there are missing values for some x2 levels in some of the subjects.
> I am interested in testing the main effect of say x1 in the presence of the interaction term x1:x2. In order to achieve this, I convert x2 to a numeric variable using sum contrasts - as summarized for example [here]. This gives the same results for inference (nearly identical p-values) as using mixed() from the afex package. This corresponds to a "type III test".
> The linked summary, however, mentions that the main effect we test essentially corresponds to the effect of x1 averaged over all levels of x2 (which I assume corresponds to x2=0 when converted to numeric and using true contrasts). But, there are missing data for y for some of the levels of x2 in some of the subjects, so how does this affect the averaging.
> My Qs: Given the above implementation of the model,...
> 1) ...do missing values lead to putting a stronger weight on x2 levels that do not have missing values when averaging over x2 levels to calculate the main effect of x1? OR is R aware of that distortion and calculates a weighted average of some sort (I don't see how)?
> 2) ...is this a problem? I assume it could lead to distorted estimation of main effect significance in smaller data sets or if there are a lot of missing values for certain levels of x2?
> 3) ...is there a way around it, like forcing R to weighting the average? Or would it be advisable to compare y~x2 to y~x1+x2, instead? I would certainly be inclined to use this if the interaction was insignificant (agreed?) but what if it is significant? Any other options?
> Thanks for the help,
> : https://arxiv.org/pdf/1405.2094.pdf "here"
> [[alternative HTML version deleted]]
More information about the R-sig-mixed-models