Dear all,
while working on a package for influential data in mixed models, I
encountered a peculiar situation. As I'll show in the examples below,
I found that (at least in some situations) the sequence in which
variables are entered in the model formula, influences the parameters
of the model. In most cases, these differences are really small, but I
did encounter an example which leads to rather huge differences.
Unfortunately, I cannot publicly share the data, nor the outcomes, of
that example, so I'll have to do with an example resulting in smaller
differences. Nevertheless, I wouldn't expect any differences (neither
small, nor big) when using exactly the same model formula on the same
set of data, but only with a different sequence in which the variables
are entered.
In order to provide a reproducible example, I use the ScotsSec data
from the mlmRev package, which I modify slightly. The first
modifications relate to a manipulated intercept and added 'dummy'-
variable, which is part of the procedure for determining influential
data (but, the exact reasons for this are not very relevant for this
problem). Note, however, that the intercept and the additional dummy
variable (estex.2) are each others' opposite.
library(mlmRev)
data(ScotsSec)
ScotsSec$intercept.alt <- ifelse(ScotsSec$second=="2", 0, 1)
ScotsSec$estex.2 <- ifelse(ScotsSec$second=="2", 1, 0)
I use these data, and its modifications, to estimate two models. Both
data and variables in the model formulae are identical, except that in
model.a the dummy-variable estex.2 is added as the last of the fixed
effects, whereas in model.b it is added directly after the modified
intercept-variable (intercept.alt). Also, in both models the
'standard' intercept is omitted. When the fixed effects of these two
models are compared, these are exactly identical as expected.
model.a <- lmer(attain ~ intercept.alt + verbal + social + estex.2 +
(0 + intercept.alt | primary) + (0 + intercept.alt|second) -1 ,
data=ScotsSec)
model.b <- lmer(attain ~ intercept.alt + estex.2 + verbal + social +
(0 + intercept.alt | primary) + (0 + intercept.alt|second) -1 ,
data=ScotsSec)
fixef(model.a)
fixef(model.b) # Identical
However, testing exactly these model formulae on a logistic set of
data, results in (slightly) different estimates for the fixed effects.
To illustrate this, I created a binary outcome variable ('pass'), and
specify the same models as above, with the addition of the
family="binomial" parameter.
ScotsSec$pass <- ifelse(ScotsSec$attain > 5, 1,0)
model.c <- lmer(pass ~ intercept.alt + verbal + social + estex.2 +
(0 + intercept.alt | primary) + (0 + intercept.alt|second) -1 ,
data=ScotsSec,
family="binomial")
model.d <- lmer(pass ~ intercept.alt + estex.2 + verbal + social +
(0 + intercept.alt | primary) + (0 + intercept.alt|second) -1 ,
data=ScotsSec,
family="binomial")
fixef(model.c)
fixef(model.d) # Not Identical
To my surprise, now the fixed effects differ. How is this possible,
and, why does this only seem to occur using glmer? I would not have
expected the model outcomes to be subject to the sequence in which the
variables are specified.
I would be grateful if anyone could shed some light on this (to me)
rather peculiar situation,
with kind regards,
Rense Nieuwenhuis
[[alternative HTML version deleted]]