[R-sig-ME] R-sig-mixed-models Digest, Vol 136, Issue 41
Rune Haubo
rune.haubo at gmail.com
Sat Apr 28 11:32:58 CEST 2018
On 27 April 2018 at 22:33, Nicolas Flaibani <n.flaiba at hotmail.com> wrote:
>
> Hi everyone!
>
>
> I would like to join the discussion of the second topic (Specifying Multiple Random Effects in NLME by Dan) but in my case I’m going to talk about the lme4 package, instead of the nmle. I think that the question is still relevant.
>
> I’ve had a doubt for a long time about how to investigate the interactions between random and fixed effects. I’ve read a lot of forums, papers and help’s packages and I’ve always concluded that the correct form of testing the interaction between a random and a fixed variable was:
>
>
> Model1 <- lmer (Y ~ X1 + X2 + (X1 | Random Variable)
>
>
> However, I found in some forums and personal communications from several researchers that there is another way to investigate the interaction between random and fixed variables and has the following syntax:
>
>
> Model2 <- lmer (Y ~ X1 + X2 + (1 | Random Variable: X1)
>
The 'classical' (think DOE and 'variance-components') interaction is
Model2 if X1 is categorical/factor and Model1 if X1 is continuous
(then Model1 is sometimes called the 'random coefficient model').
If X1 is continuous Model2 doesn't make sense - on the other hand
Model1 can be fitted if X1 is a factor. In that case Model1 is rather
complex and the number of parameters grows rapidly with the number of
levels of X1 - in comparison the random term in Model2 uses just one
(variance) parameter. While some people often favour 'Model1 with
factor X1'- construction, I often think it is difficult to explain
what this model actually means; it is also very often overfitting
since it requires a lot of data and special data-generating mechanism
to support models of that form.
>
> I understand that this syntax
>
>
> (1|Random Variable/X1) = (1|Random Variable)+(1|Random Variable:X1)
>
>
> specify a nested structure between the variables and this is not the case of interest.
This construction serves two purposes: one is when the random-effect
variables have a 'truly' nested structure such as pupils in classes
(in schools, in districts, etc). The other purpose often applies to
designed experiments where you might have machines (fixed) and
operators (random). The main effects model is then
Model3 <- lmer(Y ~ machines + (1 | operators))
and a model that includes the interaction reads
Model3b <- lmer(Y ~ machines + (1 | operators) + (1 | machines:operators))
Technically the 'machines:operators' combinations are nested in
'operators' but we usually don't think of it that way.
The point here is that we need to also consider Model3b as an
alternative to Model1 and Model2. Of these models, Model3 _and_ Model2
are the simplest while Model1 is the most complex with Model3b in the
middle often serving as an appropriate compromise.
>
> My particular question is whether the syntax of the Model 2 is correct to test interactions between random and fixed variables. If this model is correct, which are the differences with the syntax of Model 1, since the resulting models are clearly different? Besides, in coincidence with the question of Dan (“Are there cases where one vs. the other formulation should absolutely be used? My understanding that for continuous variables, e.g., multiple measurements across multiple days, Days|Object would be the correct syntax. But here we're talking about a factor variable”), I ask if one type of syntax should be used if the fixed variables are continuous or there are factors.
>
> If I compare the summary from a model with the latter syntax (model 2), with the summary of the same analysis made with a statistic program (like Statistica), the results are very similar. That’s not the case with the model 1.
>
> For example, if I analyze a morphological trait with the syntax
>
>
> M2 <- lmer (Wing ~ Temperature * Sex + Temperature + Sex + (1 | Line) + (1 | Line:Sex:Temperature) + (1 | Line:Sex) + (1 | Line:Temperature))
>
>
> the summary is the following:
>
>
> Random effects:
>
> Groups Name Variance Std.Dev.
>
> Line:Sex:Temperature (Intercept) 14.6231 3.8240
>
> Line:Temperature (Intercept) 154.7685 12.4406
>
> Line:Sex (Intercept) 0.6947 0.8335
>
> Line (Intercept) 72.5945 8.5202
>
> Residual 180.0664 13.4189
>
>
> Fixed effects:
>
> Estimate Std. Error df t value Pr(>|t|)
>
> (Intercept) 501.141 2.268 96.940 221.009 < 2e-16 ***
>
> Temperature25 -57.960 2.699 54.800 -21.473 < 2e-16 ***
>
> SexM -53.639 1.001 96.260 -53.584 < 2e-16 ***
>
> Temperature25:SexM -6.488 1.391 48.300 -4.663 2.49e-05 ***
>
>
> I found that the function rand() from the lmerTest package gives me the p values of the random effects if I write the model like this:
>
>> rand(M2)
>
> Analysis of Random effects Table:
>
>
> Chi.sq Chi.DF p.value
>
> Line 4.6152 1 0.03 *
>
> Line:Sex:Temperature 30.8130 1 3e-08 ***
>
> Line:Sex 0.0391 1 0.84
>
> Line:Temperature 112.1539 1 <2e-16 ***
>
>
> I don’t know if this function is reliable because it is not mentioned for testing the significance of the random effects in the page https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html
Having written rand (=ranova) my views may be biased but I should say
that it _is_ reliable. That is, I haven't seen cases where it's not
doing what was intended ;-) rand() and ranova() simply compares two
models using anova(m1, m2, refit=FALSE) where m2 differs from m1 by
having one of its random-effect terms reduced or removed.
>
> The summary of the same analysis made with the Statistica is:
>
>
>
>
> Effect
>
> SS
>
> Degr. of
>
> MS
>
> Den.Syn.
>
> Den.Syn.
>
> F
>
> p
>
> Intercept
>
> Fixed
>
> 764579151
>
> 1
>
> 764579151
>
> 48,001
>
> 12655,91
>
> 60412,83
>
> 0,000000
>
> Line
>
> Random
>
> 608181
>
> 48
>
> 12670
>
> 47,800
>
> 6489,64
>
> 1,95
>
> 0,011254
>
> Sex
>
> Fixed
>
> 3138661
>
> 1
>
> 3138661
>
> 48,038
>
> 495,62
>
> 6332,81
>
> 0,000000
>
> Temperature
>
> Fixed
>
> 3686660
>
> 1
>
> 3686660
>
> 48,003
>
> 6459,62
>
> 570,72
>
> 0,000000
>
> Line*Sex
>
> Random
>
> 23808
>
> 48
>
> 496
>
> 48,000
>
> 473,30
>
> 1,05
>
> 0,435866
>
> Line*Temperature
>
> Random
>
> 310413
>
> 48
>
> 6467
>
> 48,000
>
> 473,30
>
> 13,66
>
> 0,000000
>
> Sex*Temperature
>
> Fixed
>
> 10075
>
> 1
>
> 10075
>
> 48,040
>
> 472,94
>
> 21,30
>
> 0,000029
>
> Line*Sex*Temperature
>
> Random
>
> 22718
>
> 48
>
> 473
>
> 3696,000
>
> 167,33
>
> 2,83
>
> 0,000000
>
> Error
>
> 618467
>
> 3696
>
> 167
>
>
>
>
> But if I write the model with the other syntax:
>
> M1 <- lmer(Wing ~ Temperature * Sex + (Temperature * Sex | Line))
>
Writing the model as
M1 <- lmer(Wing ~ Temperature * Sex + (0 + Temperature:Sex | Line))
is often preferable as it makes the random-effect variance-covariance
matrix easier to interpret.
>
>
> the summary is the following:
>
>
>
> REML criterion at convergence: 31440.9
>
>
>
> Random effects:
>
> Groups Name Variance Std.Dev. Corr
>
> Line (Intercept) 266.78 16.333
>
> Temperature25 398.27 19.957 -0.60
>
> SexM 41.54 6.446 -0.56 0.46
>
> Temperature25:SexM 61.34 7.832 0.56 -0.61 -0.80
>
> Residual 167.33 12.936
>
>
>
> Fixed effects:
>
> Estimate Std. Error df t value Pr(>|t|)
>
> (Intercept) 501.603 2.371 48.046 211.586 < 2e-16 ***
>
> Temperature25 -58.423 2.911 48.027 -20.070 < 2e-16 ***
>
> SexM -53.659 1.095 47.964 -49.023 < 2e-16 ***
>
> Temperature 25:SexM -6.470 1.393 48.278 -4.644 2.66e-05 ***
>
> ---
>
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
>
>
> In addition, if I apply the ‘rand function´ for this syntax (M1), it doesn’t retourn the whole p-values of the random effects (Do not give me p value for line and line*Temperature*Sex)
Use lmerTest::ranova(M1, reduce.terms=FALSE) to achieve a test for the
entire random-effect term (and make sure you have lmerTest version >=
3.0-0 installed).
>
> Analysis of Random effects Table:
>
> Chi.sq Chi.DF p.value
>
> Temperatura:Línea 0.00e+00 0 1
>
> Sexo:Línea 1.46e-10 0 <2e-16 ***
>
>
> I really appreciate your time and dedication for answering this questions. Thank you for trying to help us understand a little more about the syntax of these complex models and thus better understand their correct approach.
You are not alone if you think this is complicated. My students are
for the most part challenged in simply writing up the mathematical
formula that correctly represents models such as Model1 -
transitioning from scalar random-effect terms (e.g. Model3b) to
vector-valued random-effect terms (e.g. Model1) often takes time and
dedication.
Cheers
Rune
>
> Thank you very much for your time everyone.
>
>
>
> Greetings,
>
> Nicolas
>
>
> ----------------------------------------------------------------------------------------
> Message: 2
> Date: Wed, 25 Apr 2018 16:11:38 -0400
> From: Dan <ieshan at gmail.com>
> To: "R-SIG-Mixed-Models at R-project.org"
> <R-sig-mixed-models at r-project.org>, Ben Bolker <bbolker at gmail.com>
> Subject: [R-sig-ME] Specifying Multiple Random Effects in NLME
> Message-ID:
> <CAET4i1f-SH5zA6xcCxNXc091HCk+snMv+rFm0tf995yYukiCOw at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi all:
>
> I am curating an off-list thread about specifying multiple random effects
> in NLME.
>
> 1. If it's (1|Object) + (1|Object:Coating) that you want then
> you should be able to use a nested specification (which nlme *can*
> handle relatively easily), i.e. something like
>
> random=a+b+c~1|Object/Coating
>
>
> Although (Coating|Object) and (1|Object:Coating) both in some sense
> represent "interactions" the latter is *much* simpler/more parsimonious.
>
> If you're OK with 1|Object:Coating rather than Coating|Object it
> should be *much* faster. If you don't understand the distinction (which
> would be absolutely fine and understandable) can we resume the
> discussion on r-sig-mixed-models ... ?
> -----------
>
> So:
> random=a+b+c~Coating|Object
> does not fit.
>
> But:
> random=a+b+c~Object/Coating
> fits.
>
> Can you better explain the distinction here? I have sometimes used the
> 1|Object:Coating + 1|Object syntax and sometimes the Coating|Object syntax
> in other models. My experience/understanding is that the former syntax with
> multiple "within subject" variables produces exactly matching output to the
> standard "repeated measures ANOVA" with the lmer assumption of compound
> symmetry.
>
> Are there cases where one vs. the other formulation should absolutely be
> used? My understanding that for continuous variables, e.g., multiple
> measurements across multiple days, Days|Object would be the correct syntax.
> But here we're talking about a factor variable.
>
>
> 2. I'm trying to read the "random" section for nlme right now but it's
> kind of making my head explode (and I think there's a typo: " the same
> as the order of the order of the elements in the list"). It *sounds*
> like (1) explicitly creating an interaction
> ObjCoating=interaction(Object,Coating) and (2) using something like
>
> list(ObjCoating=a~1,Object=b~1,Object=c~1)
>
> should work (grouping factors as names, then [right-hand-side variable
> name]~[random effects model], but I'm worried about the phrase "The
> order of nesting will be assumed the same as the order of the elements
> in the list": what nesting?
> -----------
>
> I think that formulation is explicitly in order. I replaced your first
> ObjCoating with simply Object, just to test what would happen:
>
> Random effects:
> Formula: a ~ 1 | Object
> a.(Intercept)
> StdDev: 1.305816
>
> Formula: b ~ 1 | Object %in% Object
> b.(Intercept)
> StdDev: 0.01576521
>
> Formula: c ~ 1 | Object %in% Object %in% Object
> c.(Intercept) Residual
> StdDev: 2.677883 2.219676
>
> [[alternative HTML version deleted]]
>
>
>
>
> *****
>
> [[alternative HTML version deleted]]
>
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
More information about the R-sig-mixed-models
mailing list