[R-sig-ME] afex in missing data (when testing main effects in presence of interactions in factorial designs)
Thomas Deimel
t.deimel at yahoo.de
Sat Jun 10 21:31:27 CEST 2017
Hi Henrik and Phillip,
thanks a lot for the detailed answers and making use of the example to explain. This has already been very useful and helped to clear up what I think were misunderstandings on my part.
I have been playing around with the example some more and for my current purposes, I think this is enough. I do still have loads of further questions about this issue in general, though, and my brain normally does not like to let go of questions it has stumbled upon. So, should I find the time to get my Qs in order and do some more reading-up first, I hope it’s ok, if I come back to ask them.
Best,
Thomas
> Am 06.06.2017 um 11:34 schrieb Alday, Phillip <Phillip.Alday at mpi.nl>:
>
> Hi Henrik, hi Thomas,
>
> There is one small additional footnote to add to the Type-II/Type-III
> comparison:
>
> Type-II tests respect the principle of marginality, while Type-III
> tests do not. This is discussed somewhat in John Fox's Applied
> Regression textbook and is one of the main points in Venables' Exegeses
> on Linear Models.
>
> https://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf
>
> In other words: Type-III tests allow you to test main effects as if the
> interaction were not there, while Type-II tests incorporate both the
> interaction and main effect into the test for the main effect. When the
> data are balanced, this are the same. (If I have misunderstood this
> point or formulated it infelicitously, I am happy to be corrected!)
>
> Best,
> Phillip
>
> On Mon, 2017-06-05 at 12:15 +0200, Henrik Singmann wrote:
>> Hi Thomas,
>>
>> As afex is mentioned, let me weigh in with my thoughts (though I am
>> as
>> always happy for corrections).
>>
>> First of all, afex implements basically the procedure described in
>> the
>> document by Roger Levy, when using the same type of tests (i.e.,
>> likelihood ratio tests). Using the second example in the document we
>> get:
>>
>> require(mvtnorm)
>> require(afex)
>> set.seed(1)
>> M <- 12
>> dat <- expand.grid(X=factor(c("x1","x2")),
>> Y=factor(c("y1","y2","y3")),
>> subj=factor(paste("S",1:M)),
>> item=factor(paste("I",1:M)))
>> dat$XY <- with(dat,factor(paste(X,Y)))
>> beta.XY <- matrix(c(1,4,5,2,3,3),2,3)
>> b.S <- rmvnorm(M,rep(0,6),diag(6)/100)
>> b.I <- rmvnorm(M,rep(0,6),diag(6)/100)
>> dat$Response <- with(dat,beta.XY[cbind(X,Y)] + b.S[cbind(subj,XY)] +
>> b.I[cbind(item,XY)] +
>> rnorm(nrow(dat),sd=0.1))
>> Y.numeric <- sapply(dat$Y,function(i) contr.sum(3)[i,])
>> dat$Y1 <- Y.numeric[1,]
>> dat$Y2 <- Y.numeric[2,]
>> m0 <- lmer(Response ~ Y1 + X:Y1 + Y2 + X:Y2 + (XY|subj) + (XY|item),
>> dat, REML = FALSE)
>> m1 <- lmer(Response ~ X*Y + (XY|subj) + (XY|item),dat, REML = FALSE)
>> anova(m0,m1)
>> # Data: dat
>> # Models:
>> # m0: Response ~ Y1 + X:Y1 + Y2 + X:Y2 + (XY | subj) + (XY | item)
>> # m1: Response ~ X * Y + (XY | subj) + (XY | item)
>> # Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
>> # m0 48 -1040.6 -812.01 568.28 -1136.6
>> # m1 49 -1038.7 -805.35 568.33 -1136.7 0.0999 1 0.752
>>
>> (mm1 <- mixed(Response ~ X*Y + (XY|subj) + (XY|item),dat,
>> method = "LRT"))
>> # Mixed Model Anova Table (Type 3 tests, LRT-method)
>> #
>> # Model: Response ~ X * Y + (XY | subj) + (XY | item)
>> # Data: dat
>> # Df full model: 49
>> # Effect df Chisq p.value
>> # 1 X 1 0.10 .75
>> # 2 Y 2 66.06 *** <.0001
>> # 3 X:Y 2 87.90 *** <.0001
>> # ---
>> # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘+’ 0.1 ‘ ’ 1
>>
>> As can be seen, the test of x is identical with a chi-square value
>> of
>> 0.0999 (rounded to 0.10) and a p-value of 0.75.
>>
>> We can also see that the intercept correspond exactly to the grand
>> mean:
>> fixef(mm1$full_model)[1]
>> # (Intercept)
>> # 3.006103
>>
>> mean(dat$Response)
>> # [1] 3.006103
>>
>> Your question is now what happens for unbalanced data. Type III
>> tests
>> (as implemented here and in the document) assume the imbalance is
>> random
>> and not structural (i.e., data is missing completely at random). This
>> is
>> achieved by weighting the cells equally and not the data points. In
>> other words, we simply do the analysis assuming each cell (i.e.,
>> combination of factors) had the same size. As a consequence, the
>> grand
>> mean does not correspond to the grand mean of the data, but to the
>> unweighted mean of the cells. (Note that this only holds
>> approximately
>> for mixed models, as the random effects also play a role here. For
>> standard ANOVAs this is exactly true.)
>>
>> An example can show this. We first remove one data point and then
>> fit
>> the data again (with a very reduced model as we are only interested
>> in
>> the intercept estimate here):
>>
>> dat2 <- dat[-1,]
>>
>> mm2 <- mixed(Response ~ X*Y + (1|subj),dat2, return="merMod")
>> fixef(mm2)[1]
>> # (Intercept)
>> # 3.006236
>>
>> mean(dat2$Response)
>> # [1] 3.008559
>>
>> mean(tapply(dat2$Response, INDEX = list(dat2$XY), mean))
>> # [1] 3.006254
>>
>> As can be seen, the intercept corresponds approximately to the
>> unweighted grand mean of all cells, but not to the grand mean of the
>> data.
>>
>> As said above, the small difference between the intercept and the
>> grand
>> mean are a consequence of the random effects. Change the random
>> effects
>> structure to the one above and you will see even more differences.
>> Furthermore, for standard ANOVA Type III tests exactly correspond to
>> the
>> unweighted grand mean
>>
>> What this tells us is that Type III tests are only appropriate if
>> the
>> missing is not structural but random. In other words, the treatment
>> (i.e., condition levels) should not be the cause for the missing
>> data
>> for Type III tests. Type III test correct for missing data by simply
>> treating the data as having no missing data (i.e., all cells are
>> weighed
>> equally).
>>
>> If the missing were structural (i.e., a consequence of the factor
>> levels) this approach is of course not appropriate. Then we would
>> want
>> to use something like Type II tests or what you describe in which
>> the
>> group sizes are taken into account and the intercept correspond to
>> the
>> grand mean. For example, for observational data where group sizes
>> are
>> very often informative Type II tests seem more appropriate.
>>
>> I expect this difference in how Type II and Type III tests deal with
>> imbalance is also one of the reasons why those statisticans that work
>> a
>> lot with observational data prefer Type II over Type III tests
>> whereas
>> those that work with experimental data (where imbalance is usually
>> small
>> and random) prefer Type III tests over Type II test.
>>
>> Hope that helps,
>> Henrik
>>
>> PS: It always helps to include some example data.
>>
>> Am 04.06.2017 um 15:41 schrieb Thomas Deimel via R-sig-mixed-models:
>>>
>>> Hi everyone,
>>>
>>> I am using lmer (or mixed() from the afex package) to implement a
>>> linear mixed model of the form
>>>
>>> y~1+x1+x2+x1:x2+(by-subject random effects),
>>>
>>> where x1 and x2 are both categorical variables/factors. The design
>>> is balanced but there are missing values for some x2 levels in some
>>> of the subjects.
>>>
>>> I am interested in testing the main effect of say x1 in the
>>> presence of the interaction term x1:x2. In order to achieve this, I
>>> convert x2 to a numeric variable using sum contrasts - as
>>> summarized for example [here][1]. This gives the same results for
>>> inference (nearly identical p-values) as using mixed() from the
>>> afex package. This corresponds to a "type III test".
>>>
>>> The linked summary, however, mentions that the main effect we test
>>> essentially corresponds to the effect of x1 averaged over all
>>> levels of x2 (which I assume corresponds to x2=0 when converted to
>>> numeric and using true contrasts). But, there are missing data for
>>> y for some of the levels of x2 in some of the subjects, so how does
>>> this affect the averaging.
>>>
>>> My Qs: Given the above implementation of the model,...
>>>
>>> 1) ...do missing values lead to putting a stronger weight on x2
>>> levels that do not have missing values when averaging over x2
>>> levels to calculate the main effect of x1? OR is R aware of that
>>> distortion and calculates a weighted average of some sort (I don't
>>> see how)?
>>>
>>> 2) ...is this a problem? I assume it could lead to distorted
>>> estimation of main effect significance in smaller data sets or if
>>> there are a lot of missing values for certain levels of x2?
>>>
>>> 3) ...is there a way around it, like forcing R to weighting the
>>> average? Or would it be advisable to compare y~x2 to y~x1+x2,
>>> instead? I would certainly be inclined to use this if the
>>> interaction was insignificant (agreed?) but what if it is
>>> significant? Any other options?
>>>
>>> Thanks for the help,
>>>
>>> Thomas
>>>
>>>
>>>
>>>
>>>
>>>
>>> [1]: https://arxiv.org/pdf/1405.2094.pdf "here"
>>> [[alternative HTML version deleted]]
>>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list