[R-sig-ME] What is the appropriate zero-correlation parameter model for factors in lmer?
Rune Haubo
rune@h@ubo @ending from gm@il@com
Fri May 25 10:27:26 CEST 2018
On 24 May 2018 at 21:22, Reinhold Kliegl <reinhold.kliegl using gmail.com> wrote:
> Here is an update (final for now) with 10 models and 4 hierarchical
> sequences. Comments, details, and a demonstration with the Machines data are
> available in a new RPub[1]. I switched to a notation with numeric covariates
> to reduce potential confusion about terms containing `1+f` and `0+f`; `c1`
> and `c2` are contrasts defined for a factor with levels `A`, `B`, and `C`.
>
> ## LMMs
>
> ### Prototype LMMs
>
> max_LMM: m1 = y ~ 1 + c1 + c2 + (1 + c1 + c2 | subj)
> prsm1_LMM: m2 = y ~ 1 + c1 + c2 + (1 | subj) + (0 + c1 + c2 | subj)
> zcp_LMM: m3 = y ~ 1 + c1 + c2 + (1 + c1 + c2 || subj)
> prsm2_LMM: m4 = y ~ 1 + c1 + c2 + (1 + c2 || subj)
> min_LMM: m5 = y ~ 1 + c1 + c2 + (1 | subj)
>
> Protoype refers to prsm1 and prsm2; there are also variations of them. Prsm1
> are models pruning correlation parameters; prsm2 are models pruning variance
> components in the absence of correlation parameters. The order reflects
> hierarchical decreasing model complexity.
>
> ### LMMs with interaction term
>
> int_LMM: m6 = y ~ 1 + c1 + c2 + (1 | subj) + (1 | factor:subj)
> min2_Lmm: m7 = y ~ 1 + c1 + c2 + (1 | factor:subj)
>
> ### LMMs with (0 + f | g) RE structures
>
> maxL_MM_RE0: m8 = y ~ 1 + c1 + c2 + (0 + A + B + C | subj)
> prsm1_LMM_RE0: m9 = y ~ 1 + c1 + c2 + (0 + A | subj) + (0 + B + C | subj)
> zcp_LMM_RE0: m10 = y ~ 1 + c1 + c2 + (0 + A + B + C || subj)
>
> ## Hierarchical model sequences
>
> Here are the sequences I am confident about.
>
> (1) max_LMM -> prsm1_LMM -> zcp_LMM -> prsm2_LMM -> min1_LMM
> (2) max_LMM -> int_LMM -> min1_LMM
> (3) max_LMM -> int_LMM -> min2_LMM
> (4) max_LMM_RE0 -> prsm1_LMM_RE0 -> zcp_LMM_RE0
>
> So far, in my research, I have worked almost exclusively with Sequence (1)
> and do not recall ever experiencing technical problems or inconsistencies as
> long as there was no overparameterization. It has served me well in the
> determination of parsimonious LMMs[2, 3].
>
> For complex fixed-effect structures (i.e., for models with many factors or
> with factors with many levels), the number of correlation parameters grows
> very rapidly. If this goes together with a modest number of observations or
> levels of the random factor, Sequences (2) and (3) might be a good place to
> start to avoid convergence problems. Of course, if your hypotheses are about
> correlation parameters, these LMMs will not get you very far.
>
> Finally, Sequence (4) could be a default strategy if one is interested in
> the fixed effects and if one does not want to spend much time in wondering
> about the meaning of CPs. However, as correlations between levels of fixed
> factors are can be very large (at least larger than correlations between
> effects), LMMs in this sequence may be prone to convergence problems.
>
> A few open questions -
> (1) Rune Haubo proposed that min2_LMM (his fm2) is nested under zcp_LMM_RE0
> (his fm3) and could serve as a baseline model, but I don't understand why it
> is nested.
> (2) Marteen Jung wonders about Rune Haubo's fm5 (with four VCs; there was
> typo in my last post). The ten models above have at most 3 VCs (i.e., the
> number of levels of the factor). I also don't see a good place for it in the
> sequences. Am I missing something?
> (3) Any suggestions for models between maxLMM and intLMM and for models
> between intLMM and minLMM?
I must admit that I haven't read all of the above in detail but I can
address the
specific questions.
First a note on terminology: My understanding is that a variance component (VC)
is an random-effects term that is independent of other terms and
classically a term of the form
(1 | g) or (1 | g:f). If we also allow the VC-terminology for random terms with
vector-valued random-effects, e.g. (g | f), then arguably none of the models
have more than two VCs (excluding residuals).
For the Machines data fm5 reads
fm5 <- lmer(score ~ Machine + (1 | Worker) +
(0 + dummy(Machine, "A") | Worker) +
(0 + dummy(Machine, "B") | Worker) +
(0 + dummy(Machine, "C") | Worker), Machines)
which may also be specified as (thanks Reinhold Kliegl)
mm0 <- model.matrix(~ 0 + Machine, Machines)
A <- mm0[, 1]
B <- mm0[, 2]
C <- mm0[, 3]
fm5b <- lmer(score ~ Machine + (1 | Worker) + (0 + A + B + C ||
Worker), Machines)
anova(fm5, fm5b, refit=FALSE) # Chisq = 0
and there other variants as well.
Other relevant models to be discussed are:
fm3 <- lmer(score ~ Machine + (0 + A + B + C || Worker), Machines)
fm2 <- lmer(score ~ Machine + (1| Worker:Machine), Machines)
fm6 <- lmer(score ~ Machine + (0 + Machine | Worker), Machines)
Since observations on different Workers are independent we can understand these
models by considering how they parameterize the variance-covariance
matrix of the
9 observations that belong to each Worker in the marginal distribution
of the observations.
fm5, uses 5 variance-parameters to parameterize the by-Worker
(9x9) variance-covariance matrix, Cov(Y_j): 1 for (1 | Worker), 3 for
(0 + A + B + C || Worker) and 1 for the residual variance.
The covariance structure for within-Worker observations is:
- The variance of each observation is sigma_w^2 + sigma_i^2 + sigma^2 where
the terms refer to Worker, the i'th Machine and residuals.
- The covariance between two different obervations from the same Machine and
Worker is sigma_w^2 + sigma_i^2, i.e. it depends on which Machine was used.
- The covariance between two different observations on the same Worker but
different Machines is just sigma_w^2.
Thus, if (1 | Worker) is removed from fm5 it reduces to fm3 and it amounts to
setting sigma_w^2 to zero. In effect the covariance betwen two different
observations on the same Worker but different Machines is 0.
If instead we consider model fm6 which corresponds to exchanging all random
terms in fm5 with (0 + Machine | Worker) it changes the covariance of two
observations from the same Worker and different Machines from a constant
sigma_w^2 to sigma_{i,i'}^2, i.e., it depends on which pair of Machines the
two observations come from. Since there are 6 possible pairs of Machines fm6
uses 6+1 variance parameters to parameterize the Worker-level covariance matrix,
i.e. 2 more than fm5.
For the Machines data fm2 reads
fm2 <- lmer(score ~ Machine + (1 | Worker:Machine), Machines)
This model says that the covariance between two different observations
on the same
Worker and same Machine is sigma_{mw}^2 thus it is different from fm3 by setting
sigma_i^2 to the constant (wrt. Machine) sigma_{mw}^2. With 3 Machines fm2
saves 2 variance-parameters relative to fm3 and ends up using only 2 variance
parameters. For completeness, the variance of an observation is
sigma_{mw}^2 + sigma^2
and the covariance of any two observations which are not from the same
Worker and the
same Machine is 0.
Now, I skipped a number of details but an email is not exactly suited for a
mathematical exposition so I hope the above makes sense, and I hope it
helps explain the nesting structure.
Cheers
Rune
>
> Best
> Reinhold Kliegl
>
> [1] http://rpubs.com/Reinhold/391828
> [2] https://arxiv.org/abs/1506.04967
> [3] https://www.sciencedirect.com/science/article/pii/S0749596X17300013
>
>
>
> On Tue, May 22, 2018 at 1:17 PM, Reinhold Kliegl <reinhold.kliegl using gmail.com>
> wrote:
>>
>> There is an interpretable alternative to fm5 (actually there are many
>> ...), called fm8 below, that avoids the redundancy between variance
>> components. The change is to switch from (1 |g) + (0 + f | g) = (1 | g) +
>> (0 + A + B + C | g) to 1 | g) + (0 + c1 + c2 |g ), where c1 and c2 are the
>> contrasts defined for f. (I have actually used such LMMs quite often.) With
>> this specification the difference to the maxLMM (fm6) is that the
>> correlation between intercept and contrasts is suppressed to zero. The
>> correlation parameters now refer to the correlations between effects of c1
>> and c2, not to the correlations between A, B, and C. Actually, this is but
>> one example of many LMMs one could slot into this position of the
>> hierarchical model sequences. At this level of model complexity one can
>> suppress various subsets of correlation parameters (as illustrated in Bates
>> et al. (2015)[1] and various vignettes of the RePsychLing package).
>>
>>
>> fm1 = y ~ 1 + f + (1 | g) # minimal LMM version 1
>> (min1LMM)
>> fm2 = y ~ 1 + f + (1 | f:g) # minimal LMM version 2
>> (min2LMM)
>> fm3 = y ~ 1 + f + (0 + f || g) # zcpLMM with 0 in RE
>> (zcpLMM_RE0)
>> fm4 = y ~ 1 + f + (1 | g) + (1 | f:g) # LMM w/ f x g interaction
>> (intLMM)
>> fm5 = y ~ 1 + f + (1 | g) + (0 + f | g) # N/A
>> fm6 = y ~ 1 + f + (1 + f | g) # maximal LMM (maxLMM)
>> fm7 = y ~ 1 + f + (1 + f || g) # zcpLMM with 1 in RE
>> (zcpLMM_RE1)
>> fm8 = y ~ 1 + f + (1 | g) + (0 + c1 + c2 | g) # parsimonious LMM
>> (prsmLMM)
>>
>> Hierarchical model sequences
>>
>> (1) maxLMM_RE1 -> prsmLMM -> intLMM -> min1LMM # fm6 -> fm8 -> fm4 ->
>> fm1
>> (2) maxLMM_RE1 -> prsmLMM -> intLMM -> min2LMM # fm6 -> fm8 -> fm4 ->
>> fm2
>> (3) maxLMM_RE0 -> prsmLMM -> zcpLMM_RE0 -> min2LMM # fm6 -> fm8 -> fm3 ->
>> fm2
>> (4) maxLMM_RE1 -> prsmLMM -> zcpLMM_RE1 -> min1LMM # fm6 -> fm8 -> fm7 ->
>> fm1 (new sequence)
>> ```
>>
>> I will update the RPub in the next days.
>>
>> [1] https://arxiv.org/pdf/1506.04967.pdf
>>
>>
>> Best regards,
>> Reinhold Kliegl
>>
>> On Tue, May 22, 2018 at 11:00 AM, Maarten Jung
>> <Maarten.Jung using mailbox.tu-dresden.de> wrote:
>>>
>>> I see that fm2 is nested within fm3 and fm4.
>>> But I have a hard time understanding fm3 and fm2 because, as Reinhold
>>> Kiegl said, they specify the f:g interaction but without the g main effect.
>>> Can someone provide an intuition for these models?
>>>
>>> Also, it is not entirely clear to me what fm5 represents. It looks to me,
>>> and again I am with Reinhold Kiegl , as if there were over-parameterization
>>> going on.
>>>
>>> Cheers,
>>> Maarten
>>>
>>> On Tue, May 22, 2018 at 9:45 AM, Reinhold Kliegl
>>> <reinhold.kliegl using gmail.com> wrote:
>>>>
>>>> Ok, I figured out the answer to the question about fm2.
>>>>
>>>> fm2 is indeed a very nice baseline for fm3 and fm4. So I distinguish
>>>> between min1LMM and min2LMM.
>>>>
>>>> fm1 = y ~ 1 + f + (1 | g) # minimal LMM version 1
>>>> (min1LMM)
>>>> fm2 = y ~ 1 + f + (1 | f:g) # minimal LMM version 2
>>>> (min2LMM)
>>>> fm3 = y ~ 1 + f + (0 + f || g) # zcpLMM with 0 in RE
>>>> (zcpLMM_RE0)
>>>> fm4 = y ~ 1 + f + (1 | g) + (1 | f:g) # LMM w/ f x g interaction
>>>> (intLMM)
>>>> fm5 = y ~ 1 + f + (1 | g) + (0 + f | g) # N/A
>>>> fm6 = y ~ 1 + f + (1 + f | g) # maximal LMM (maxLMM)
>>>> fm7 = y ~ 1 + f + (1 + f || g) # zcpLMM with 1 in RE
>>>> (zcpLMM_RE1)
>>>>
>>>>
>>>> (1) maxLMM_RE1 -> intLMM -> min1LMM # fm6 -> fm4 -> fm1
>>>> (2) maxLMM_RE1 -> intLMM -> min2LMM # fm6 -> fm4 -> fm2
>>>> (3) maxLMM_RE0 -> zcpLMM_RE0 -> min2LMM # fm6 -> fm3 -> fm2
>>>> (4) maxLMM_RE1 -> zcpLMM_RE1 -> min1LMM # fm6 -> fm7 -> fm1 (new
>>>> sequence)
>>>>
>>>>
>>>> On Tue, May 22, 2018 at 12:21 AM, Reinhold Kliegl
>>>> <reinhold.kliegl using gmail.com> wrote:
>>>>>
>>>>> Sorry, I am somewhat late to this conversation. I am responding to this
>>>>> thread, because it fits my comment very well, but it was initially triggered
>>>>> by a previous thread, especially Rune Haubo's post here [1]. So I hope it is
>>>>> ok to continue here.
>>>>>
>>>>> I have a few comments and questions. For details I refer to an RPub I
>>>>> put up along with this post [2]. I start with a translation between Rune
>>>>> Haubo's fm's and the terminology I use in the RPub:
>>>>>
>>>>> fm1 = y ~ 1 + f + (1 | g) # minimal LMM (minLMM)
>>>>> fm3 = y ~ 1 + f + (0 + f || g) # zero-corr param LMM with 0 in
>>>>> RE (zcpLMM_RE0)
>>>>> fm4 = y ~ 1 + f + (1 | g) + (1|f:g) # LMM w/ fixed x random factor
>>>>> interaction (intLMM),
>>>>> fm6 = y ~ 1 + f + (1 + f | g) # maximal LMM (maxLMM)
>>>>> fm7 = y ~ 1 + f + (1 + f || g) # zero-corr param LMM with 1 in
>>>>> RE (zcpLMM_RE1)
>>>>>
>>>>> Notes: f is a fixed factor, g is a group (random) factor; fm1 to fm6
>>>>> are in Rune Haubo's post; fm7 is new (added by me). I have not used fm2 and
>>>>> fm5 so far (see below).
>>>>>
>>>>> (I) The post was triggered by the question whether intLMM is nested
>>>>> under zcpLMM. I had included this LRT in my older RPub cited in the thread,
>>>>> but I stand corrected and agree with Rune Haubo that intLMM is not nested
>>>>> under zcpLMM. For example, in the new RPub, I show that slightly modified
>>>>> Machines data exhibit smaller deviance for intLMM than zcpLMM despite an
>>>>> additional model parameter in the latter. Thanks for the critical reading.
>>>>>
>>>>>
>>>>> (II) Here are Runo Haubo's sequences (left, resorted) augmented with my
>>>>> translation (right)
>>>>>
>>>>> (1) fm6 -> fm5 -> fm4 -> fm1 # maxLMM_RE1 -> fm5 -> intLMM ->
>>>>> minLMM
>>>>> (2) fm6 -> fm5 -> fm4 -> fm2 # maxLMM_RE1 -> fm5 -> intLMM -> fm2
>>>>> (3) fm6 -> fm5 -> fm3 -> fm2 # maxLMM_RE1 -> fm5 -> zcpLMM_RE0 -> fm2
>>>>>
>>>>> and here are sequences I came up with (left) augmented with translation
>>>>> into RH's fm's.
>>>>>
>>>>> (1) maxLMM_RE1 -> intLMM -> minLMM # fm6 -> fm4 -> fm1
>>>>> (3) maxLMM_RE0 -> zcpLMM_RE0 # fm6 -> fm3
>>>>> (4) maxLMM_RE1 -> zcpLMM_RE1 -> minLMM # fm6 -> fm7 -> fm1 (new
>>>>> sequence)
>>>>>
>>>>>
>>>>> (III) I have questions about fm2 and fm5.
>>>>> fm2: fm2 redefines the levels of the group factor (e.g., in the cake
>>>>> data there are 45 groups in fm2 compared to 15 in the other models). Why is
>>>>> fm2 nested under fm3 and fm6? Somehow it looks to me that you include an f:g
>>>>> interaction without the g main effect (relative to fm4). This looks like an
>>>>> interesting model; I would appreciate a bit more conceptual support for its
>>>>> interpretation in the model hierarchy.
>>>>> fm5: fm5 specifies 4 variance components (VCs), but the factor has
>>>>> only 3 levels. So to me this looks like there is redundancy built into the
>>>>> model. In support of this intuition, for the cake data, one of the VCs is
>>>>> estimated with 0. However, in the Machine data the model was not degenerate.
>>>>> So I am not sure. In other words, if the factor levels are A, B, C, and the
>>>>> two contrasts are c1 and c2, I thought I can specify either (1 + c1 + c2) or
>>>>> (0 + A + B + C). fm5 specifies (1 + A + B + C) which is rank deficient in
>>>>> the fixed effect part, but not necessarily in the random-effect term. What
>>>>> am I missing here?
>>>>>
>>>>> [1]
>>>>> https://stat.ethz.ch/pipermail/r-sig-mixed-models/2018q2/026775.html
>>>>> [2] http://rpubs.com/Reinhold/391027
>>>>>
>>>>> Best,
>>>>> Reinhold Kliegl
>>>>>
>>>>>
>>>>> On Thu, May 17, 2018 at 12:43 PM, Maarten Jung
>>>>> <Maarten.Jung using mailbox.tu-dresden.de> wrote:
>>>>> >
>>>>> > Dear list,
>>>>> >
>>>>> > When one wants to specify a lmer model including variance components
>>>>> > but no
>>>>> > correlation parameters for categorical predictors (factors) afaik one
>>>>> > has
>>>>> > to convert the factors to numeric covariates or use lme4::dummy().
>>>>> > Until
>>>>> > recently I thought m2a (or equivalently m2b using the double-bar
>>>>> > syntax)
>>>>> > would be the correct way to specify such a zero-correlation parameter
>>>>> > model.
>>>>> >
>>>>> > But in this thread [1] Rune Haubo Bojesen Christensen pointed out
>>>>> > that this
>>>>> > model does not make sense to him. Instead he suggests m3 as an
>>>>> > appropriate
>>>>> > model.
>>>>> > I think this is a *highly relevant difference* for everyone who uses
>>>>> > factors in lmer and therefore I'm bringing up this issue again. But
>>>>> > maybe
>>>>> > I'm mistaken and just don't get what is quite obvious for more
>>>>> > experienced
>>>>> > mixed modelers.
>>>>> > Please note that the question is on CrossValidated [2] but some
>>>>> > consider it
>>>>> > as off-topic and I don't think there will be an answer any time soon.
>>>>> >
>>>>> > So here are my questions:
>>>>> > How should one specify a lmm without correlation parameters for
>>>>> > factors and
>>>>> > what are the differences between m2a and m3?
>>>>> > Is there a preferred model for model comparison with m4 (this model
>>>>> > is also
>>>>> > discussed here [3])?
>>>>> >
>>>>> > library("lme4")
>>>>> > data("Machines", package = "MEMSS")
>>>>> >
>>>>> > d <- Machines
>>>>> > contrasts(d$Machine) # default coding: contr.sum
>>>>> >
>>>>> > m1 <- lmer(score ~ Machine + (Machine | Worker), d)
>>>>> >
>>>>> > c1 <- model.matrix(m1)[, 2]
>>>>> > c2 <- model.matrix(m1)[, 3]
>>>>> > m2a <- lmer(score ~ Machine + (1 | Worker) + (0 + c1 | Worker) + (0 +
>>>>> > c2 |
>>>>> > Worker), d)
>>>>> > m2b <- lmer(score ~ Machine + (c1 + c2 || Worker), d)
>>>>> > VarCorr(m2a)
>>>>> > Groups Name Std.Dev.
>>>>> > Worker (Intercept) 5.24354
>>>>> > Worker.1 c1 2.58446
>>>>> > Worker.2 c2 3.71504
>>>>> > Residual 0.96256
>>>>> >
>>>>> > m3 <- lmer(score ~ Machine + (1 | Worker) + (0 + dummy(Machine, "A")
>>>>> > |
>>>>> > Worker) +
>>>>> > (0 + dummy(Machine, "B")
>>>>> > |
>>>>> > Worker) +
>>>>> > (0 + dummy(Machine, "C")
>>>>> > |
>>>>> > Worker), d)
>>>>> > VarCorr(m3)
>>>>> > Groups Name Std.Dev.
>>>>> > Worker (Intercept) 3.78595
>>>>> > Worker.1 dummy(Machine, "A") 1.94032
>>>>> > Worker.2 dummy(Machine, "B") 5.87402
>>>>> > Worker.3 dummy(Machine, "C") 2.84547
>>>>> > Residual 0.96158
>>>>> >
>>>>> > m4 <- lmer(score ~ Machine + (1 | Worker) + (1 | Worker:Machine), d)
>>>>> >
>>>>> >
>>>>> > [1]
>>>>> > https://stat.ethz.ch/pipermail/r-sig-mixed-models/2018q2/026775.html
>>>>> > [2] https://stats.stackexchange.com/q/345842/136579
>>>>> > [3] https://stats.stackexchange.com/q/304374/136579
>>>>> >
>>>>> > Best regards,
>>>>> > Maarten
>>>>> >
>>>>> > [[alternative HTML version deleted]]
>>>>> >
>>>>> > _______________________________________________
>>>>> > R-sig-mixed-models using r-project.org mailing list
>>>>> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>
>>>>
>>>
>>
>
More information about the R-sig-mixed-models
mailing list