[R-sig-ME] Interpreting lmer() interactions with Helmert contrasts

Steven McKinney smckinney at bccrc.ca
Mon Aug 24 20:22:01 CEST 2015

Hi Becky,

For a model containing A + B + A:B we have two situations

Case 1)  Interest in A, but the need to have B in the model (B's parameter is a nuisance parameter in the model - B needs to be in the model do adjust for an important factor so that the model behaves properly, but B is not the factor we are interested in testing).  This is what I saw as the relevant situation in your case (you seemed to want to test Time, while adjusting for WordType).

Case 2)  Interest in the relevance of both A and B  (discussed in the dialog to which you linked below)

The discussion you link to below provides this hierarchy of models

l.full = lmer(response ~ A + B + A:B + (1 + A | sub) + (1 | item), data, family="binomial")
l.AB = lmer(response ~ A + B + (1 + A | sub) + (1 | item), data, family="binomial")
l.A = lmer(response ~ A + (1 + A | sub) + (1 | item), data, family="binomial")
l.B = lmer(response ~ B + (1 | sub) + (1 | item), data, family="binomial")

but omits

l.reduced = lmer(response ~ (1 | sub) + (1 | item), data, family="binomial")

i.e  the model with neither A nor B.

Case 1)  If we are interested in A, the omnibus test is

anova (l.B, l.full)

If this test is significant, and the contribution of A to the model is of a size of scientific relevance, then you can declare A as a significant model component and begin to investigate the functional form of that contribution.

The next step would be to test the interaction.  If that is significant and of relevant scientific size, then A is important, but its contribution differs for different levels of B.  If the interaction is not significant, and the sample size was large enough to detect differences of importance, then the interaction term can be dropped and the main effects model best summarizes the association.

Case 2) If we are interested in both A and B, the omnibus test for their joint relevance is

anova( l.reduced, l.full ).  This test was not discussed in the link you provided.

If this test yields a non-significant p-value, then A and B are not contributing to improving the model fit and their usefulness is questionable if the data set size was large enough to detect effect sizes of scientific relevance.

If the p-value is small, then of course we need to assess whether the improved model fit is telling us anything of scientific value.  (All p-values get small when data set sizes get large - so then the question is the relevance of the degree of association.)

The problem with the discussion you link to is that two test results were posited to assess the relevance of both A and B

"if anova(l.full, l.A) is significant, B has an effect (main effect or interaction).
if anova(l.full, l.B) is significant, A has an effect (main effect or interaction)."

so there's two tests and no discussion of adjustment for multiple comparisons.  The omnibus test anova( l.reduced, l.full ) tests both A and B simultaneously in one test at the stated type I error rate.  If that test is significant, and the effect sizes of A and B in the model are more than just trivially small differences of no scientific or biological or medical relevance, then you can start assessing the nature of their joint contribution to the model.  The first thing to look at would then be the interaction term.  If that is significant, and of a relevant size, you are done.  Both A and B are important, but the contribution of A depends on the level of B.  If the interaction is not significant, then you can look at A and B individually and see which is contributing to the model fit at a level of scientific or biological or medical relevance.

Steven McKinney, Ph.D.

Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: smckinney +at+ bccrc +dot+ ca

tel: 604-675-8000 x7561

Molecular Oncology
675 West 10th Ave, Floor 4, Room 4.122
Vancouver B.C.
V5Z 1L3
From: Becky Gilbert <beckyannegilbert at gmail.com>
Sent: August 24, 2015 4:54 AM
To: Steven McKinney
Cc: Ken Beath; Dan McCloy; r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Interpreting lmer() interactions with Helmert contrasts

Thanks very much everyone for the responses.

@Dan: Thank you for the recommendation about my factor contrast coefficients.  I hadn't given much thought to the sign/level association, but now that you point it out, it seems obvious that I should do it the way you describe.  Here are the model coefficients with recoded contrasts:

> contrasts(rtData$Time)
-1 -0.5  # pre-test
1   0.5  # post-test

> contrasts(rtData$WordType)
        [,1]        [,2]
0 -0.6666667  0.0  # untrained
1  0.3333333  0.5  # trained-related
2  0.3333333 -0.5  # trained-unrelated

                              Estimate     Std. Error    t value
(Intercept)               2.8765116  0.0177527  162.03
WordType1            -0.0111628  0.0110852   -1.01
WordType2            -0.0007306  0.0071519   -0.10
Time1                    0.0268310  0.0195248    1.37
WordType1:Time1   0.0301627  0.0115349    2.61
WordType2:Time1  -0.0089123  0.0141624   -0.63

My interpretations of the interaction coefficients are:
1) log RT increases (i.e. RTs slow down) for the two trained (vs untrained) Word Types at post-test (Time = 1)
2) log RT decreases (i.e. RTs speed up) for the trained-related (vs trained-unrelated) Word Type at post-test (Time = 1)..

However, this doesn't really answer my original question about how to assess (and report) the contribution of these two interactions to the model fit.  Obviously the t statistic is larger for the Time1:WordType1 compared to the Time1:WordType2 interaction coefficients, but that only tells me their relative contributions - I would need to know degrees of freedom to get p-values, which I understand is not straightforward.  Also, I've read that the t statistics for coefficients that are output by summary() for an lmer model are sequential tests and thus not the appropriate/desired statistics for assessing the contribution of factors (someone please correct me if I'm wrong!).  Hence the reason for using LRT to assess this.  This still leaves me with the problem of not being able to test the interactions between Time and the two contrasts for WordType - I can test the whole WordType factor and Time:WordType interaction via LRTs, but not each contrast within WordType.

@Steven: thanks for your explanation re interpreting main effects in the presence of an interaction, and of the Chi-square LRTs for assessing the contribution of factors/terms.

However I'm confused by this:

An omnibus test for the statistical significance of a variable of interest (say variable A), when that variable is in a model involving an interaction with another variable (say variable B) will test the interaction term A:B and the main effect A.  The full model has A + B + A:B and the reduced model has only B.  Thus a proper omnibus test for the usefulness of A in the model will involve the interaction A:B and the main effect A.  This test really should be done before testing A:B for proper multiple comparisons control.

Is this what you're saying?

1. test A: (A + B + A:B) vs (B)
2. test B: (A + B + A:B) vs (A)
then, if either of the above are significant:
3. test A:B: (A + B + A:B) vs (A + B)

Which I think is the procedure described here: https://mailman.ucsd.edu/pipermail/ling-r-lang-l/2011-October/000305.html
Assuming this is what you meant, will this procedure always get you to step 3 (assessing the interaction) in the case of a significant interaction without main effects (as in a cross-over interaction).  Sorry if I've completely misunderstood!



Dr Becky Gilbert (nee Prince)


R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org> mailing list

More information about the R-sig-mixed-models mailing list