[R-sig-ME] afex in missing data (when testing main effects in presence of interactions in factorial designs)

Sat Jun 10 21:31:27 CEST 2017

Hi Henrik and Phillip,

thanks a lot for the detailed answers and making use of the example to explain. This has already been very useful and helped to clear up what I think were misunderstandings on my part. 

I have been playing around with the example some more and for my current purposes, I think this is enough. I do still have loads of further questions about this issue in general, though, and my brain normally does not like to let go of questions it has stumbled upon. So, should I find the time to get my Qs in order and do some more reading-up first, I hope it’s ok, if I come back to ask them.

Best,

Thomas

> Am 06.06.2017 um 11:34 schrieb Alday, Phillip <Phillip.Alday at mpi.nl>:
> 
> Hi Henrik, hi Thomas,
> 
> There is one small additional footnote to add to the Type-II/Type-III
> comparison:
> 
> Type-II tests respect the principle of marginality, while Type-III
> tests do not. This is discussed somewhat in John Fox's Applied
> Regression textbook and is one of the main points in Venables' Exegeses
> on Linear Models.
> 
> https://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf
> 
> In other words: Type-III tests allow you to test main effects as if the
> interaction were not there, while Type-II tests incorporate both the
> interaction and main effect into the test for the main effect. When the
> data are balanced, this are the same. (If I have misunderstood this
> point or formulated it infelicitously, I am happy to be corrected!) 
> 
> Best,
> Phillip
> 
> On Mon, 2017-06-05 at 12:15 +0200, Henrik Singmann wrote:
>> Hi Thomas,
>> 
>> As afex is mentioned, let me weigh in with my thoughts (though I am
>> as 
>> always happy for corrections).
>> 
>> First of all, afex implements basically the procedure described in
>> the 
>> document by Roger Levy, when using the same type of tests (i.e., 
>> likelihood ratio tests). Using the second example in the document we
>> get:
>> 
>> require(mvtnorm)
>> require(afex)
>> set.seed(1)
>> M <- 12
>> dat <- expand.grid(X=factor(c("x1","x2")),
>>                     Y=factor(c("y1","y2","y3")),
>>                     subj=factor(paste("S",1:M)),
>>                     item=factor(paste("I",1:M)))
>> dat$XY <- with(dat,factor(paste(X,Y)))
>> beta.XY <- matrix(c(1,4,5,2,3,3),2,3)
>> b.S <- rmvnorm(M,rep(0,6),diag(6)/100)
>> b.I <- rmvnorm(M,rep(0,6),diag(6)/100)
>> dat$Response <- with(dat,beta.XY[cbind(X,Y)] + b.S[cbind(subj,XY)] +
>>                         b.I[cbind(item,XY)] +
>> rnorm(nrow(dat),sd=0.1))
>> Y.numeric <- sapply(dat$Y,function(i) contr.sum(3)[i,])
>> dat$Y1 <- Y.numeric[1,]
>> dat$Y2 <- Y.numeric[2,]
>> m0 <- lmer(Response ~ Y1 + X:Y1 + Y2 + X:Y2 + (XY|subj) + (XY|item),
>>             dat, REML = FALSE)
>> m1 <- lmer(Response ~ X*Y + (XY|subj) + (XY|item),dat, REML = FALSE)
>> anova(m0,m1)
>> # Data: dat
>> # Models:
>> # m0: Response ~ Y1 + X:Y1 + Y2 + X:Y2 + (XY | subj) + (XY | item)
>> # m1: Response ~ X * Y + (XY | subj) + (XY | item)
>> #    Df     AIC     BIC logLik deviance  Chisq Chi Df Pr(>Chisq)
>> # m0 48 -1040.6 -812.01 568.28  -1136.6
>> # m1 49 -1038.7 -805.35 568.33  -1136.7 0.0999      1      0.752
>> 
>> (mm1 <- mixed(Response ~ X*Y + (XY|subj) + (XY|item),dat,
>>                method = "LRT"))
>> # Mixed Model Anova Table (Type 3 tests, LRT-method)
>> #
>> # Model: Response ~ X * Y + (XY | subj) + (XY | item)
>> # Data: dat
>> # Df full model: 49
>> #   Effect df     Chisq p.value
>> # 1      X  1      0.10     .75
>> # 2      Y  2 66.06 ***  <.0001
>> # 3    X:Y  2 87.90 ***  <.0001
>> # ---
>> # Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘+’ 0.1 ‘ ’ 1
>> 
>> As can be seen, the test of x is identical with a chi-square value
>> of 
>> 0.0999 (rounded to 0.10) and a p-value of 0.75.
>> 
>> We can also see that the intercept correspond exactly to the grand
>> mean:
>> fixef(mm1$full_model)[1]
>> # (Intercept)
>> #    3.006103
>> 
>> mean(dat$Response)
>> # [1] 3.006103
>> 
>> Your question is now what happens for unbalanced data. Type III
>> tests 
>> (as implemented here and in the document) assume the imbalance is
>> random 
>> and not structural (i.e., data is missing completely at random). This
>> is 
>> achieved by weighting the cells equally and not the data points. In 
>> other words, we simply do the analysis assuming each cell (i.e., 
>> combination of factors) had the same size. As a consequence, the
>> grand 
>> mean does not correspond to the grand mean of the data, but to the 
>> unweighted mean of the cells. (Note that this only holds
>> approximately 
>> for mixed models, as the random effects also play a role here. For 
>> standard ANOVAs this is exactly true.)
>> 
>> An example can show this. We first remove one data point and then
>> fit 
>> the data again (with a very reduced model as we are only interested
>> in 
>> the intercept estimate here):
>> 
>> dat2 <- dat[-1,]
>> 
>> mm2 <- mixed(Response ~ X*Y + (1|subj),dat2, return="merMod")
>> fixef(mm2)[1]
>> # (Intercept)
>> #    3.006236
>> 
>> mean(dat2$Response)
>> # [1] 3.008559
>> 
>> mean(tapply(dat2$Response, INDEX = list(dat2$XY), mean))
>> # [1] 3.006254
>> 
>> As can be seen, the intercept corresponds approximately to the 
>> unweighted grand mean of all cells, but not to the grand mean of the
>> data.
>> 
>> As said above, the small difference between the intercept and the
>> grand 
>> mean are a consequence of the random effects. Change the random
>> effects 
>> structure to the one above and you will see even more differences. 
>> Furthermore, for standard ANOVA Type III tests exactly correspond to
>> the 
>> unweighted grand mean
>> 
>> What this tells us is that Type III tests are only appropriate if
>> the 
>> missing is not structural but random. In other words, the treatment 
>> (i.e., condition levels) should not be the cause for the missing
>> data 
>> for Type III tests. Type III test correct for missing data by simply 
>> treating the data as having no missing data (i.e., all cells are
>> weighed 
>> equally).
>> 
>> If the missing were structural (i.e., a consequence of the factor 
>> levels) this approach is of course not appropriate. Then we would
>> want 
>> to use something like Type II tests or what you describe in which
>> the 
>> group sizes are taken into account and the intercept correspond to
>> the 
>> grand mean. For example, for observational data where group sizes
>> are 
>> very often informative Type II tests seem more appropriate.
>> 
>> I expect this difference in how Type II and Type III tests deal with 
>> imbalance is also one of the reasons why those statisticans that work
>> a 
>> lot with observational data prefer Type II over Type III tests
>> whereas 
>> those that work with experimental data (where imbalance is usually
>> small 
>> and random) prefer Type III tests over Type II test.
>> 
>> Hope that helps,
>> Henrik
>> 
>> PS: It always helps to include some example data.
>> 
>> Am 04.06.2017 um 15:41 schrieb Thomas Deimel via R-sig-mixed-models:
>>> 
>>> Hi everyone,
>>> 
>>> I am using lmer (or mixed() from the afex package) to implement a
>>> linear mixed model of the form
>>> 
>>> y~1+x1+x2+x1:x2+(by-subject random effects),
>>> 
>>> where x1 and x2 are both categorical variables/factors. The design
>>> is balanced but there are missing values for some x2 levels in some
>>> of the subjects.
>>> 
>>> I am interested in testing the main effect of say x1 in the
>>> presence of the interaction term x1:x2. In order to achieve this, I
>>> convert x2 to a numeric variable using sum contrasts - as
>>> summarized for example [here][1]. This gives the same results for
>>> inference (nearly identical p-values) as using mixed() from the
>>> afex package. This corresponds to a "type III test".
>>> 
>>> The linked summary, however, mentions that the main effect we test
>>> essentially corresponds to the effect of x1 averaged over all
>>> levels of x2 (which I assume corresponds to x2=0 when converted to
>>> numeric and using true contrasts). But, there are missing data for
>>> y for some of the levels of x2 in some of the subjects, so how does
>>> this affect the averaging.
>>> 
>>> My Qs: Given the above implementation of the model,...
>>> 
>>> 1) ...do missing values lead to putting a stronger weight on x2
>>> levels that do not have missing values when averaging over x2
>>> levels to calculate the main effect of x1? OR is R aware of that
>>> distortion and calculates a weighted average of some sort (I don't
>>> see how)?
>>> 
>>> 2) ...is this a problem? I assume it could lead to distorted
>>> estimation of main effect significance in smaller data sets or if
>>> there are a lot of missing values for certain levels of x2?
>>> 
>>> 3) ...is there a way around it, like forcing R to weighting the
>>> average? Or would it be advisable to compare y~x2 to y~x1+x2,
>>> instead? I would certainly be inclined to use this if the
>>> interaction was insignificant (agreed?) but what if it is
>>> significant? Any other options?
>>> 
>>> Thanks for the help,
>>> 
>>> Thomas
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>    [1]: https://arxiv.org/pdf/1405.2094.pdf "here"
>>> 	[[alternative HTML version deleted]]
>>> 
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

	[[alternative HTML version deleted]]