[R-sig-ME] Different lmer results using contrasts() vs numeric coding
Dan McCloy
drmccloy at uw.edu
Tue Jan 26 22:48:06 CET 2016
Using numeric variables is not the same as hand-coding the contrasts. If
you pass in a numeric variable the modeling function will assume it is a
numerically continuous predictor, and you will get one coefficient
regardless of how many "levels" you represented numerically. Try something
like
contrasts(data$type_handcoded) <- as.matrix(cbind(data$type1, data$type2))
to specify the contrast matrix by hand.
Dear R Mixed Models List,
I'm working on a LMM for psycholinguistic data with a 3x2 fixed effects
structure and crossed subject and item random effects. I ran into a
confusing result when I compared the use of contrasts() for fixed factor
variables vs 'hand-coding' these contrasts into numeric variables (with the
same values assigned using contrasts()).
The summaries for the two models that include all fixed factor terms are
identical. Here is the syntax used for each:
modelHandCoded <- lmer(invRT ~ 1 + type1 + type2 + priming1 +
type1:priming1 + type2:priming1 + (1|pp) + (1|word), data = data)
# numeric variables to define contrasts
table(data$type1)
-2 1
1503 3014
table(data$type2)
-1 0 1
1520 1503 1494
table(data$priming1)
-1 1
2253 2264
modelContrasts <- lmer(invRT ~ 1 + type + priming + type:priming + (1|pp) +
(1|word), data = data)
# factor variables with contrasts
contrasts(data$type) # [,1] = 'type1' above, [,2] = 'type2'
above
[,1] [,2]
SC -2 0
IC 1 1
IIH 1 -1
contrasts(data$priming) # [,1] = 'priming1' above
[,1]
unprimed -1
primed 1
However, when I remove the effect of the 3-level fixed factor 'type' (while
still including its interaction with the 2-level factor 'priming'), the two
models no longer produce the same results. Here is the syntax for the two
models without 'type':
modelHandCoded.NoType <- lmer(invRT ~ 1 + priming1 + type1:priming1 +
type2:priming1 + (1|pp) + (1|word), data = data)
modelContrasts.NoType <- lmer(invRT ~ 1 + priming + type:priming + (1|pp) +
(1|word), data = data)
The summary for the hand-coded model includes 3 fixed effects that I
expected (priming1, type1:priming1, type1:priming2):
summary(modelHandCoded.NoType)
...
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 1.428e+00 3.244e-02 2.900e+01 44.031 <2e-16 ***
priming1 8.878e-03 3.572e-03 4.384e+03 2.485 0.013 *
type1:priming1 8.416e-04 2.527e-03 4.385e+03 0.333 0.739
type2:priming1 -1.790e-04 4.373e-03 4.383e+03 -0.041 0.967
However the summary for the contrasts() model includes additional
interaction terms for each level of priming1:
summary(modelContrasts.NoType)
...
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 1.428e+00 3.243e-02 2.900e+01 44.046 <2e-16 ***
priming1 8.872e-03 3.572e-03 4.386e+03 2.484 0.0130 *
priming0:type1 6.289e-03 4.811e-03 3.130e+02 1.307 0.1921
priming1:type1 7.957e-03 4.802e-03 3.110e+02 1.657 0.0985 .
priming0:type2 -9.782e-03 8.323e-03 3.120e+02 -1.175 0.2408
priming1:type2 -1.009e-02 8.316e-03 3.110e+02 -1.213 0.2261
When I compare each reduced model to the full model, I find that there's a
difference between the full model and the reduced hand coded model, but not
between the full model and the reduced model using contrasts(). The
Df/AIC/BIC/LL for the latter two models are identical, so it appears that
removing the 'type' term had no effect. (This is true for comparisons with
both the hand-coded and contrasts() versions of the full model.) Here are
the results of the anova() for each comparison:
Model with full fixed-effects structure vs. hand-coded model with 'type'
removed
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
..1 7 168.14 213.05 -77.07 154.14
object 9 167.20 224.94 -74.60 149.20 4.9406 2 0.08456 .
Model with full fixed-effects structure vs. contrast-coded model with
'type' removed
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
object 9 167.2 224.94 -74.6 149.2
..1 9 167.2 224.94 -74.6 149.2 0 0 < 2.2e-16 ***
Can anyone explain why the two reduced models differ depending on whether
the fixed factor variables are hand-coded numeric vs. factors with
contrasts() assigned? Also, why is there no effect of removing a fixed
factor term when contrasts() are used? Apologies if I'm missing something
obvious!
Thanks,
Becky
_____________________________________________________
Dr Becky Gilbert
Research Associate
Psychology and Language Sciences
University College London
London WC1H 0AP
[[alternative HTML version deleted]]
_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list