[R-sig-ME] afex in missing data (when testing main effects in presence of interactions in factorial designs)

Alday, Phillip Phillip.Alday at mpi.nl
Tue Jun 6 12:34:45 CEST 2017


Hi Henrik, hi Thomas,

There is one small additional footnote to add to the Type-II/Type-III
comparison:

Type-II tests respect the principle of marginality, while Type-III
tests do not. This is discussed somewhat in John Fox's Applied
Regression textbook and is one of the main points in Venables' Exegeses
on Linear Models.

https://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf

In other words: Type-III tests allow you to test main effects as if the
interaction were not there, while Type-II tests incorporate both the
interaction and main effect into the test for the main effect. When the
data are balanced, this are the same. (If I have misunderstood this
point or formulated it infelicitously, I am happy to be corrected!) 

Best,
Phillip

On Mon, 2017-06-05 at 12:15 +0200, Henrik Singmann wrote:
> Hi Thomas,
> 
> As afex is mentioned, let me weigh in with my thoughts (though I am
> as 
> always happy for corrections).
> 
> First of all, afex implements basically the procedure described in
> the 
> document by Roger Levy, when using the same type of tests (i.e., 
> likelihood ratio tests). Using the second example in the document we
> get:
> 
> require(mvtnorm)
> require(afex)
> set.seed(1)
> M <- 12
> dat <- expand.grid(X=factor(c("x1","x2")),
>                     Y=factor(c("y1","y2","y3")),
>                     subj=factor(paste("S",1:M)),
>                     item=factor(paste("I",1:M)))
> dat$XY <- with(dat,factor(paste(X,Y)))
> beta.XY <- matrix(c(1,4,5,2,3,3),2,3)
> b.S <- rmvnorm(M,rep(0,6),diag(6)/100)
> b.I <- rmvnorm(M,rep(0,6),diag(6)/100)
> dat$Response <- with(dat,beta.XY[cbind(X,Y)] + b.S[cbind(subj,XY)] +
>                         b.I[cbind(item,XY)] +
> rnorm(nrow(dat),sd=0.1))
> Y.numeric <- sapply(dat$Y,function(i) contr.sum(3)[i,])
> dat$Y1 <- Y.numeric[1,]
> dat$Y2 <- Y.numeric[2,]
> m0 <- lmer(Response ~ Y1 + X:Y1 + Y2 + X:Y2 + (XY|subj) + (XY|item),
>             dat, REML = FALSE)
> m1 <- lmer(Response ~ X*Y + (XY|subj) + (XY|item),dat, REML = FALSE)
> anova(m0,m1)
> # Data: dat
> # Models:
> # m0: Response ~ Y1 + X:Y1 + Y2 + X:Y2 + (XY | subj) + (XY | item)
> # m1: Response ~ X * Y + (XY | subj) + (XY | item)
> #    Df     AIC     BIC logLik deviance  Chisq Chi Df Pr(>Chisq)
> # m0 48 -1040.6 -812.01 568.28  -1136.6
> # m1 49 -1038.7 -805.35 568.33  -1136.7 0.0999      1      0.752
> 
> (mm1 <- mixed(Response ~ X*Y + (XY|subj) + (XY|item),dat,
>                method = "LRT"))
> # Mixed Model Anova Table (Type 3 tests, LRT-method)
> #
> # Model: Response ~ X * Y + (XY | subj) + (XY | item)
> # Data: dat
> # Df full model: 49
> #   Effect df     Chisq p.value
> # 1      X  1      0.10     .75
> # 2      Y  2 66.06 ***  <.0001
> # 3    X:Y  2 87.90 ***  <.0001
> # ---
> # Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘+’ 0.1 ‘ ’ 1
> 
> As can be seen, the test of x is identical with a chi-square value
> of 
> 0.0999 (rounded to 0.10) and a p-value of 0.75.
> 
> We can also see that the intercept correspond exactly to the grand
> mean:
> fixef(mm1$full_model)[1]
> # (Intercept)
> #    3.006103
> 
> mean(dat$Response)
> # [1] 3.006103
> 
> Your question is now what happens for unbalanced data. Type III
> tests 
> (as implemented here and in the document) assume the imbalance is
> random 
> and not structural (i.e., data is missing completely at random). This
> is 
> achieved by weighting the cells equally and not the data points. In 
> other words, we simply do the analysis assuming each cell (i.e., 
> combination of factors) had the same size. As a consequence, the
> grand 
> mean does not correspond to the grand mean of the data, but to the 
> unweighted mean of the cells. (Note that this only holds
> approximately 
> for mixed models, as the random effects also play a role here. For 
> standard ANOVAs this is exactly true.)
> 
> An example can show this. We first remove one data point and then
> fit 
> the data again (with a very reduced model as we are only interested
> in 
> the intercept estimate here):
> 
> dat2 <- dat[-1,]
> 
> mm2 <- mixed(Response ~ X*Y + (1|subj),dat2, return="merMod")
> fixef(mm2)[1]
> # (Intercept)
> #    3.006236
> 
> mean(dat2$Response)
> # [1] 3.008559
> 
> mean(tapply(dat2$Response, INDEX = list(dat2$XY), mean))
> # [1] 3.006254
> 
> As can be seen, the intercept corresponds approximately to the 
> unweighted grand mean of all cells, but not to the grand mean of the
> data.
> 
> As said above, the small difference between the intercept and the
> grand 
> mean are a consequence of the random effects. Change the random
> effects 
> structure to the one above and you will see even more differences. 
> Furthermore, for standard ANOVA Type III tests exactly correspond to
> the 
> unweighted grand mean
> 
> What this tells us is that Type III tests are only appropriate if
> the 
> missing is not structural but random. In other words, the treatment 
> (i.e., condition levels) should not be the cause for the missing
> data 
> for Type III tests. Type III test correct for missing data by simply 
> treating the data as having no missing data (i.e., all cells are
> weighed 
> equally).
> 
> If the missing were structural (i.e., a consequence of the factor 
> levels) this approach is of course not appropriate. Then we would
> want 
> to use something like Type II tests or what you describe in which
> the 
> group sizes are taken into account and the intercept correspond to
> the 
> grand mean. For example, for observational data where group sizes
> are 
> very often informative Type II tests seem more appropriate.
> 
> I expect this difference in how Type II and Type III tests deal with 
> imbalance is also one of the reasons why those statisticans that work
>> lot with observational data prefer Type II over Type III tests
> whereas 
> those that work with experimental data (where imbalance is usually
> small 
> and random) prefer Type III tests over Type II test.
> 
> Hope that helps,
> Henrik
> 
> PS: It always helps to include some example data.
> 
> Am 04.06.2017 um 15:41 schrieb Thomas Deimel via R-sig-mixed-models:
> > 
> > Hi everyone,
> > 
> > I am using lmer (or mixed() from the afex package) to implement a
> > linear mixed model of the form
> > 
> > y~1+x1+x2+x1:x2+(by-subject random effects),
> > 
> > where x1 and x2 are both categorical variables/factors. The design
> > is balanced but there are missing values for some x2 levels in some
> > of the subjects.
> > 
> > I am interested in testing the main effect of say x1 in the
> > presence of the interaction term x1:x2. In order to achieve this, I
> > convert x2 to a numeric variable using sum contrasts - as
> > summarized for example [here][1]. This gives the same results for
> > inference (nearly identical p-values) as using mixed() from the
> > afex package. This corresponds to a "type III test".
> > 
> > The linked summary, however, mentions that the main effect we test
> > essentially corresponds to the effect of x1 averaged over all
> > levels of x2 (which I assume corresponds to x2=0 when converted to
> > numeric and using true contrasts). But, there are missing data for
> > y for some of the levels of x2 in some of the subjects, so how does
> > this affect the averaging.
> > 
> > My Qs: Given the above implementation of the model,...
> > 
> > 1) ...do missing values lead to putting a stronger weight on x2
> > levels that do not have missing values when averaging over x2
> > levels to calculate the main effect of x1? OR is R aware of that
> > distortion and calculates a weighted average of some sort (I don't
> > see how)?
> > 
> > 2) ...is this a problem? I assume it could lead to distorted
> > estimation of main effect significance in smaller data sets or if
> > there are a lot of missing values for certain levels of x2?
> > 
> > 3) ...is there a way around it, like forcing R to weighting the
> > average? Or would it be advisable to compare y~x2 to y~x1+x2,
> > instead? I would certainly be inclined to use this if the
> > interaction was insignificant (agreed?) but what if it is
> > significant? Any other options?
> > 
> > Thanks for the help,
> > 
> > Thomas
> > 
> > 
> > 
> > 
> > 
> > 
> >    [1]: https://arxiv.org/pdf/1405.2094.pdf "here"
> > 	[[alternative HTML version deleted]]
> > 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models


More information about the R-sig-mixed-models mailing list