# Chapter 7 Random and Mixed Effects Models

In this chapter we use a new “philosophy.” Up to now, treatment effects
(the \(\alpha_i\)’s) were *fixed*, unknown quantities that we tried to
estimate. This means we were making a statement about a *specific*,
*fixed* set of treatments (e.g., some specific fertilizers). Such models
are also called **fixed effects models**.

## 7.1 Random Effects Models

### 7.1.1 One-Way ANOVA

Now we use another point of view: we consider situations where
treatments are **random samples** from a **large population** of
treatments. This might seem quite special at first sight, but it is
actually very natural in many situations. Think for example of a random
sample of school classes that were drawn from all school classes in a
country. Another example could be machines that were randomly sampled
from a large population of machines. Typically, we are interested in
making a statement about some properties of the whole population and not
of the observed individuals (here: school classes or machines).

Going further with the machine example: assume that every machine
produces some samples whose quality we assess. We denote the quality of
the \(j\)th sample on the \(i\)th machine with \(Y_{ij}\). We can model such
data with the model
\[\begin{equation}
Y_{ij} = \mu + \alpha_i + \epsilon_{ij},
\tag{7.1}
\end{equation}\]
where \(\alpha_i\) is the effect of the \(i\)th machine. As the machines
were drawn randomly from a large population, we assume
\[
\alpha_i \; \textrm{i.i.d.} \sim N(0, \sigma_{\alpha}^2).
\]
We also call \(\alpha_i\) a **random effect**. Hence,
(7.1) is a so-called **random effects
model**. For the error term we have the
ususal assumption \(\epsilon_{ij}\) i.i.d. \(\sim N(0, \sigma^2)\). This
looks very similar to the “old” fixed effects model
(3.4) at first sight. However, note that the
\(\alpha_i\)’s in Equation (7.1) are **random
variables** (reflecting the sampling mechanism of the data) and not
fixed unknown parameters. This “small” change will have a large impact
on the properties of the model. In addition, we have a new parameter
\(\sigma_{\alpha}^2\) which is the variance of the random effect (here the
variance between different machines). Sometimes, such models are also
called **variance components models** because of the different variances
\(\sigma_{\alpha}^2\) and \(\sigma^2\).

Let us now inspect some properties of model (7.1).
The expected value of \(Y_{ij}\) is
\[
E[Y_{ij}] = \mu.
\]
For the variance of \(Y_{ij}\) we have
\[
\Var(Y_{ij}) = \sigma_{\alpha}^2 + \sigma^2.
\]
For the correlation structure we get
\[
\Cor(Y_{ij}, Y_{kl}) =
\left\{
\begin{array}{cc}
0 & i \neq k \\
\sigma_{\alpha}^2 / (\sigma_{\alpha}^2 + \sigma^2) & i = k, j \neq l \\
1 & i = k, j = l
\end{array}
\right.
\]
Observations from different machines are uncorrelated (here:
independent) while observations from the *same* machine are correlated.
We also call \(\sigma_{\alpha}^2 / (\sigma_{\alpha}^2 + \sigma^2)\)
**intraclass correlation (ICC)**. It is
large if \(\sigma_{\alpha}^2 \gg \sigma^2\). A large value means that
observations from the same “group” (here: machine) are very similar to
each other.

Hence, values “sharing” the same \(\alpha_i\) are *correlated*, while in
the fixed effects model (3.4) all values would
be *independent* (because there the \(\alpha_i\)’s are *parameters*, that
is *fixed*, unknown quantities).

Parameter estimation for the variance components \(\sigma_{\alpha}^2\) and
\(\sigma^2\) is typically being done with a technique called **restricted
maximum likelihood (REML)**. We
could also use “classical” maximum-likelihood estimators here, but REML
estimates are less biased. The parameter \(\mu\) is estimated with
maximum-likelihood assuming that the variances are known.

In `R`

there are many packages that can fit such models. We will consider
`lme4`

(Bates et al. 2020) and later also `lmerTest`

(Kuznetsova, Bruun Brockhoff, and Haubo Bojesen Christensen 2020).

Let us now consider Exercise 5.1 from Kuehl (2000) about an inheritance
study with beef animals. Five sires (male animals) were each mated to a
*separate* group of dams (female animals). Birth weight of eight male
calves (from different dams) in each of the five sire groups were
recorded. We want to find out what part of the total variation is due to
variation between different sires.

We first create the data set and visualize it.

```
## Create data set ####
<- c(61, 100, 56, 113, 99, 103, 75, 62, ## sire 1
weight 75, 102, 95, 103, 98, 115, 98, 94, ## sire 2
58, 60, 60, 57, 57, 59, 54, 100, ## sire 3
57, 56, 67, 59, 58, 121, 101, 101, ## sire 4
59, 46, 120, 115, 115, 93, 105, 75) ## sire 5
<- factor(rep(1:5, each = 8))
sire <- data.frame(weight, sire)
animals str(animals)
```

```
## 'data.frame': 40 obs. of 2 variables:
## $ weight: num 61 100 56 113 99 103 75 62 75 102 ...
## $ sire : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 2 2 ...
```

```
## Visualize data ####
stripchart(weight ~ sire, vertical = TRUE, pch = 1, xlab = "sire", data = animals)
```

At first sight it looks like the variation between different sires is rather small.

Now we fit the random effects model with the `lmer`

function in package
`lme4`

. We want to have a random effect per sire. This can be specified
with the notation `(1 | sire)`

in the model formula. This means that the
“granularity” of the random effect is specified after the vertical bar
“`|`

.” All observations sharing the same level of `sire`

will get the
same random effect \(\alpha_i\). The `1`

means that we want to have a
random intercept per sire. In the regression setup, we could also have a
random slope for which we would use the notation `(x | sire)`

for a
continuous predictor `x`

. However, this is not of relevance here.

```
library(lme4)
<- lmer(weight ~ (1 | sire), data = animals) fit.sire
```

As usual, the function `summary`

gives an overview of the fitted model.

`summary(fit.sire)`

```
## Linear mixed model fit by REML ['lmerMod']
## Formula: weight ~ (1 | sire)
## Data: animals
##
## REML criterion at convergence: 358.2
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.9593 -0.7459 -0.1581 0.8143 1.9421
##
## Random effects:
## Groups Name Variance Std.Dev.
## sire (Intercept) 116.7 10.81
## Residual 463.8 21.54
## Number of obs: 40, groups: sire, 5
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 82.550 5.911 13.96
```

In the summary we can read off the table labelled `Random Effects`

that \(\widehat{\sigma}_{\alpha}^2 = 116.7\) and
\(\widehat{\sigma}^2 = 463.8\). Note that the column `Std.Dev`

is
nothing else than the square root of the variance and *not* the standard
error (accuracy) of the variance estimate. The variance of \(Y_{ij}\) is
therefore estimated as \(116.7 + 463.8 = 580.5\). Hence, only about \(116.7 / 580.5 = 20\%\) of the total
variance of the birth weight is due to sire (this is the intraclass
correlation). This confirms our visual inspection from above.

Under `Fixed effects`

we find the estimate \(\widehat{\mu} = 82.55\).
It is an estimate for the expected birthweight of a male calve
of a randomly selected sire (randomly selected from the *whole*
population of all sires).

Approximate confidence intervals can be obtained with the function
`confint`

.

`confint(fit.sire, oldNames = FALSE)`

```
## 2.5 % 97.5 %
## sd_(Intercept)|sire 0.00000 24.61580
## sigma 17.32943 27.76544
## (Intercept) 69.83802 95.26197
```

Hence, an approximate confidence interval for \(\sigma_{\alpha}\) is given by \([0, \, 24.62]\). We see that the estimate \(\widehat{\sigma}_{\alpha}\) is therefore quite imprecise. The reason is that we only have five sires to estimate the variance.

What would happen if we used the “ordinary” `aov`

function here?

```
options(contrasts = c("contr.sum", "contr.poly"))
<- aov(weight ~ sire, data = animals)
fit.aov confint(fit.aov)
```

```
## 2.5 % 97.5 %
## (Intercept) 75.63725 89.46275
## sire1 -12.75051 14.90051
## sire2 1.12449 28.77551
## sire3 -33.25051 -5.59949
## sire4 -18.87551 8.77551
```

As we have used `contr.sum`

the confidence interval for \(\mu\) which can
be found under `(Intercept)`

is a confidence interval for the average of
the expected birthweight of the male offsprings of these five *specific*
sires. More precisely: Each of the five sires has its expected
birthweight \(\mu_i, i = 1, \ldots, 5\) (of a male offspring). With
`contr.sum`

we have
\[
\mu = \frac{1}{5} \sum_{i=1}^g \mu_i,
\]
i.e., \(\mu\) is the average of the expected value of these five
*specific* sires. We see that this confidence interval is *shorter* than
the one from the random effects model. The reason is the different
interpretation. The random effects model allows to make inference about
the population of all sires (whereof we have seen five so far) while the
fixed effects model allows to make inference about these five *specific*
sires. Hence, we are facing a more difficult problem with the random
effects model, this is why we are less confident in our estimate
resulting in wider confidence intervals compared to the fixed effects
model.

We can also get “estimates” (more precisely: conditional means) of the
random effects \(\alpha_i\) with the function `ranef`

.

`ranef(fit.sire) ## "estimated" (better: conditional means of) random effects`

```
## $sire
## (Intercept)
## 1 0.7183096
## 2 9.9895155
## 3 -12.9796882
## 4 -3.3743848
## 5 5.6462479
##
## with conditional variances for "sire"
```

We should of course also check the model assumptions. Here, this includes
in addition normality of the random effects (though this is hard to
check with only five observations). We get the Tukey-Anscombe plot with
the `plot`

function.

`plot(fit.sire) ## TA-plot`

To get QQ-plots of the random effects and the residuals we need to
extract them first and then use `qqnorm`

as usual.

```
par(mfrow = c(1, 2))
qqnorm(ranef(fit.sire)$sire[, "(Intercept)"], main = "Random effects")
qqnorm(resid(fit.sire), main = "Residuals")
```

Sometimes you also see tests of the form \(H_0: \sigma_{\alpha} = 0\) vs.
\(H_A: \sigma_{\alpha} > 0\). For model (7.1) we
could actually get exact p-values, while for complex models later this
is not the case anymore. There, approximate or simulation based methods
have to be used. Some can be found in package `RLRsim`

(Scheipl 2020).

### 7.1.2 More Than One Factor

So far this was a one-way ANOVA model with random effects. We can extend this to the two-way ANOVA situation. For this reason we consider Example 7.1 in Kuehl (2000). A manufacturer was developing a new spectrophotometer for medical labs. A critical issue is consistency of measurements from day to day among different machines. 4 machines were randomly selected from the production process and tested on 4 randomly selected days. Per day 8 serum samples were randomly assigned to the 4 machines (2 samples per machine). Response is the triglyceride level [mg/dl] of a sample.

```
## Create data-set ####
## machine 1 machine 2 machine 3 machine 4
<- c(142.3, 144.0, 148.6, 146.9, 142.9, 147.4, 133.8, 133.2, ## day 1
y 134.9, 146.3, 145.2, 146.3, 125.9, 127.6, 108.9, 107.5, ## day 2
148.6, 156.5, 148.6, 153.1, 135.5, 138.9, 132.1, 149.7, ## day 3
152.0, 151.4, 149.7, 152.0, 142.9, 142.3, 141.7, 141.2) ## day 4
<- data.frame(y = y, day = factor(rep(1:4, each = 8)),
trigly machine = factor(rep(rep(1:4, each = 2), 2)))
str(trigly)
```

```
## 'data.frame': 32 obs. of 3 variables:
## $ y : num 142 144 149 147 143 ...
## $ day : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 2 2 ...
## $ machine: Factor w/ 4 levels "1","2","3","4": 1 1 2 2 3 3 4 4 1 1 ...
```

```
## verify number of observations
xtabs(~ day + machine, data = trigly)
```

```
## machine
## day 1 2 3 4
## 1 2 2 2 2
## 2 2 2 2 2
## 3 2 2 2 2
## 4 2 2 2 2
```

We can for example visualize this data set with an interaction plot.

```
## Plot data ####
with(trigly, interaction.plot(x.factor = day, trace.factor = machine, response = y))
```

We denote the triglyceride level of the \(k\)th sample on day \(i\) and machine
\(j\) by \(Y_{ijk}\). We use the model
\[\begin{equation}
Y_{ijk} = \mu + \alpha_i + \beta_j + (\alpha\beta)_{ij} + \epsilon_{ijk},
\tag{7.2}
\end{equation}\]
where \(\alpha_i\) is the random effect of day, \(\beta_j\) is the random effect
of machine and \((\alpha\beta)_{ij}\) is the random interaction term between
day and machine. For the random effects we have the usual assumptions
\[
\alpha_i \, \textrm{ i.i.d.} \sim N(0, \sigma_{\alpha}^2), \quad
\beta_j \, \textrm{ i.i.d.} \sim N(0, \sigma_{\beta}^2), \quad
(\alpha\beta)_{ij} \, \textrm{ i.i.d.} \sim N(0, \sigma_{\alpha\beta}^2).
\]
Let us fit this in `R`

. We want to have a random effect per day (= `(1 | day)`

),
a random effect per machine (= `(1 | machine)`

) and a random effect per
combination of machine and day (= `(1 | machine:day)`

).

```
<- lmer(y ~ (1 | day) + (1 | machine) + (1 | machine:day), data = trigly)
fit.trigly summary(fit.trigly)
```

```
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ (1 | day) + (1 | machine) + (1 | machine:day)
## Data: trigly
##
## REML criterion at convergence: 215
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.84283 -0.35581 0.03485 0.20700 2.31766
##
## Random effects:
## Groups Name Variance Std.Dev.
## machine:day (Intercept) 34.72 5.892
## machine (Intercept) 57.72 7.597
## day (Intercept) 44.69 6.685
## Residual 17.90 4.230
## Number of obs: 32, groups: machine:day, 16; machine, 4; day, 4
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 141.184 5.323 26.52
```

From the output we get \(\widehat{\sigma}_{\alpha}^2 = 44.7\) (day), \(\widehat{\sigma}_{\beta}^2 = 57.7\) (machine), \(\widehat{\sigma}_{\alpha\beta}^2 = 34.7\) (interaction of day and machine) and \(\widehat{\sigma}^2 = 17.9\).

Hence, total variance is \(44.7 + 57.7 + 34.7 + 17.9 = 155\). We see that the largest contribution to the variance is variability between different machines.

### 7.1.3 Nesting

To introduce a new concept we consider the `Pastes`

data set in `lme4`

.
The strength of a chemical paste product was measured for a total of 60
samples coming from 10 randomly selected delivery batches each
containing 3 randomly selected casks. Hence, two samples were taken from
each cask. We want to check what part of the variability of strength is
due to batch and cask.

We first load the data.

```
data("Pastes", package = "lme4")
str(Pastes)
```

```
## 'data.frame': 60 obs. of 4 variables:
## $ strength: num 62.8 62.6 60.1 62.3 62.7 63.1 60 61.4 57.5 56.9 ...
## $ batch : Factor w/ 10 levels "A","B","C","D",..: 1 1 1 1 1 1 2 2 2 2 ...
## $ cask : Factor w/ 3 levels "a","b","c": 1 1 2 2 3 3 1 1 2 2 ...
## $ sample : Factor w/ 30 levels "A:a","A:b","A:c",..: 1 1 2 2 3 3 4 4 5 5 ...
```

If we carefully think about the data structure, we have just discovered
a new way of combining factors. Cask 1 in batch 1 has nothing to do with
cask 1 in batch 2 and so on. The “1” of cask has a different meaning for
every batch. Hence, cask and batch are *not* crossed. We say cask is
**nested** in batch.

We use `ggplot2`

to visualize the data set (`R`

code for interested
readers only).

`ggplot(Pastes, aes(y = cask, x = strength)) + geom_point() + facet_grid(batch ~ .)`

The batch effect doesn’t seem to be very pronounced. On the other side, casks within the same batch can be substantially different. Let us now model this with appropriate random effects.

Let \(Y_{ijk}\) be the strength of the \(k\)th sample of cask \(j\) in batch \(i\).
We can then use the model
\[\begin{equation}
Y_{ijk} = \mu + \alpha_i + \beta_{j(i)} + \epsilon_{k(ij)},
\tag{7.3}
\end{equation}\]
where \(\alpha_i\) is the (random) effect of batch and \(\beta_{j(i)}\) is
the (random) effect of cask *within* batch. Note the special notation
\(\beta_{j(i)}\) which emphasizes that cask is nested in batch. We have
also used this nesting notation for the error term here (the error
is always nested, we have just ignored this so far). We have the “usual”
assumptions on the random effects
\[
\alpha_i \, \textrm{ i.i.d.} \sim N(0, \sigma_{\alpha}^2), \quad
\beta_{j(i)} \, \textrm{ i.i.d.} \sim N(0, \sigma_{\beta}^2),
\]
similarly for the error term.

We have to tell `lmer`

about the nesting structure. There are multiple
ways to do so. We can use the notation `(1 | batch/cask)`

which means
that we want to have a random effect per batch and per cask *within*
batch. We could also use `(1 | batch) + (1 | cask:batch)`

which means
that we want to have a random effect per batch and a random effect per
*combination* of batch and cask (which is the same as a nested effect).
Last but not least, here we could also use `(1 | batch) + (1 | sample)`

.
Why? Because `sample`

is the combination of `batch`

and `cask`

. Note
that we *cannot* use the notation `(1 | batch) + (1 | cask)`

here
because then all casks with the same number (across different batches)
would share the same effect. This is not what we want.

```
<- lmer(strength ~ (1 | batch/cask), data = Pastes)
fit.paste summary(fit.paste)
```

```
## Linear mixed model fit by REML ['lmerMod']
## Formula: strength ~ (1 | batch/cask)
## Data: Pastes
##
## REML criterion at convergence: 247
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.4798 -0.5156 0.0095 0.4720 1.3897
##
## Random effects:
## Groups Name Variance Std.Dev.
## cask:batch (Intercept) 8.434 2.9041
## batch (Intercept) 1.657 1.2874
## Residual 0.678 0.8234
## Number of obs: 60, groups: cask:batch, 30; batch, 10
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 60.0533 0.6769 88.72
```

We see that most variation is in fact due to cask within batch.

A nested effect has also some associated degrees of freedom. In the previous example cask has 3 levels in each of the 10 batches. The reasoning is now as follows. As cask has 2 degrees of freedom in each level of the 10 batches, the degrees of freedom of cask are \(10 \cdot 2 = 20\).

## 7.2 Mixed Effects Models

In practice we often encounter models which contain both random and
fixed effects. We call them **mixed effects models**.

### 7.2.1 Example: Machines Data

We start with the data set `Machines`

from the package `nlme`

. As stated
in the help file: “Data on an experiment to compare three brands of
machines used in an industrial process are presented in Milliken and
Johnson (p. 285, 1992). Six workers were chosen randomly among the
employees of a factory to operate each machine three times. The response
is an overall productivity score taking into account the number and
quality of components produced.”

```
data("Machines", package = "nlme")
## technical detail for nicer output:
"Worker"] <- factor(Machines[, "Worker"], levels = 1:6, ordered = FALSE)
Machines[, str(Machines, give.attr = FALSE) ## give.attr in order to shorten output
```

```
## Classes 'nffGroupedData', 'nfGroupedData', 'groupedData' and 'data.frame': 54 obs. of 3 variables:
## $ Worker : Factor w/ 6 levels "1","2","3","4",..: 1 1 1 2 2 2 3 3 3 4 ...
## $ Machine: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1 1 1 1 ...
## $ score : num 52 52.8 53.1 51.8 52.8 53.1 60 60.2 58.4 51.1 ...
```

Let us first visualize the data (`R`

code for interested readers only).

```
ggplot(Machines, aes(x = Machine, y = score, group = Worker, col = Worker)) +
geom_point() + stat_summary(fun.y = mean, geom = "line")
```

```
## classical interaction plot would be
with(Machines, interaction.plot(x.factor = Machine, trace.factor = Worker,
response = score))
```

We observe that productivity is largest on machine \(C\), followed by \(B\) and \(A\). Most workers show a similar “profile,” with the exception of worker 6 who performs badly on machine \(B\).

Let us now model this data. We assume that there is a population machine effect (think of an average “profile”), but each worker is allowed to have its own (random) deviation. With \(Y_{ijk}\) we denote the \(k\)th productivity score of worker \(j\) on machine \(i\). We use the model \[ Y_{ijk} = \mu + \alpha_i + \beta_j + (\alpha\beta)_{ij} + \epsilon_{ijk}, \] where \(\alpha_i\) is the fixed effect of machine \(i\) (with the usual side constraint), \(\beta_j\) is the random effect of worker \(j\) and \((\alpha\beta)_{ij}\) is the corresponding (random) interaction. What is the interpretation of this model? The “average” (over the whole population) productivity “profile” with respect to the 3 different machines is given by the fixed effect \(\alpha_i\). The (random) deviation of an individual worker consists of a general shift \(\beta_j\) (worker specific general productivity level) and a worker specific “preference” \((\alpha\beta)_{ij}\) with respect to the 3 machines.

We assume that all random effects are normally distributed, this means \[ \beta_j \, \textrm{ i.i.d.} \sim N(0, \sigma_{\beta}^2), \quad (\alpha\beta)_{ij} \, \textrm{ i.i.d.} \sim N(0, \sigma_{\alpha\beta}^2). \] For the error term we have the usual assumptions.

We visualize the model in the following plot. The solid line is the population average of the machine effect \((= \mu + \alpha_i)\), the dashed line is the population average including the worker specific general productivity level \((= \mu + \alpha_i + \beta_j)\), i.e. a parallel shift of the solid line. The dotted line in addition includes the worker specific machine preference \((= \mu + \alpha_i + \beta_j + (\alpha\beta)_{ij})\). Observations of worker \(j\) would fluctuate around these values.

We could use the `lme4`

package to fit such a model. We want to have

- a random effect \(\beta_j\) per worker:
`(1 | Worker)`

- a random effect \((\alpha\beta)_{ij}\) per combination of worker and machine:
`(1 | Worker:Machine)`

Hence, the `lmer`

call would look like this

`<- lmer(score ~ Machine + (1 | Worker) + (1 | Worker:Machine), data = Machines) fit `

As `lme4`

does *not* calculate p-values for the fixed effects, we use
the package `lmerTest`

instead. Technically speaking, it is nothing else
than a wrapper for the same function in package `lme4`

but with modified
outputs which include p-values. There are still many open issues
regarding statistical inference in mixed effects model, see for example
http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html. However, for “nice” designs (as the current one), we get
proper inference using the `lmerTest`

package.

```
options(contrasts = c("contr.treatment", "contr.poly"))
library(lmerTest)
<- lmer(score ~ Machine + (1 | Worker) + (1 | Worker:Machine), data = Machines) fit
```

We get the ANOVA table with the function `anova`

.

`anova(fit)`

```
## Type III Analysis of Variance Table with Satterthwaite's method
## Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
## Machine 38.051 19.025 2 10 20.576 0.0002855 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

The (fixed) effect of machine is significant. If we closely inspect the
output, we see that an \(F\)-distrubtion with 2 and 10 degrees of freedom
is being used. Where does the 10 come from? It seems to be a rather
small number given the sample size. Let us remind ourselves what the fixed
effect actually means. It is the average machine effect, where the
average is taken over the whole population of workers. We know that
every worker has its own random deviation of this effect. Hence, the
relevant question is whether the worker profiles just fluctuate around a
constant machine effect (all \(\alpha_i = 0\)) or whether the machine
effect is substantially larger than this worker specific machine
variation. Technically speaking, the worker specific machine variation
is nothing else than the *interaction* between worker and machine.
Hence, this boils down to comparing the variation between different
machines (having 2 degrees of freedom) to the variation due to the
interaction between machines and workers (having \(2 \cdot 5 = 10\)
degrees of freedom). The `lmerTest`

package automatically detects this
because of the structure of the random effects. This way of thinking
also allows another insight. If we want to get a more precise picture
(smaller confidence intervals, more powerful tests) about the population
average of the machine effect, we need to increase the number of workers
as this will increase the denominator degrees of freedom.
Increasing the number of observations for the available workers will
*not* help.

If we call `summary`

on the fitted object we get the individual
\(\alpha_i\)’s under `Fixed effects`

.

`summary(fit)`

```
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: score ~ Machine + (1 | Worker) + (1 | Worker:Machine)
## Data: Machines
##
## REML criterion at convergence: 215.7
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.26959 -0.54847 -0.01071 0.43937 2.54006
##
## Random effects:
## Groups Name Variance Std.Dev.
## Worker:Machine (Intercept) 13.9095 3.7295
## Worker (Intercept) 22.8584 4.7811
## Residual 0.9246 0.9616
## Number of obs: 54, groups: Worker:Machine, 18; Worker, 6
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 52.356 2.486 8.522 21.062 1.20e-08 ***
## MachineB 7.967 2.177 10.000 3.660 0.00439 **
## MachineC 13.917 2.177 10.000 6.393 7.91e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) MachnB
## MachineB -0.438
## MachineC -0.438 0.500
```

Again, be careful with the interpretation because it depends on the side
constraint that is being used. For example, here we can read off the
output that the productity score on machine \(B\) is 7.97
larger than on machine \(A\) on average. We can also get the parameter
estimates of the fixed effects only by calling `fixef(fit)`

.

Estimates of the different variance components can be found under
`Random effects`

. Often, we are not very much interested in the actual
values. We use the random effects as a “tool” to model correlated data
and to tell the fitting function that we are interested in the
population average and not the worker specific machine effects. In that
sense, including random effects in a model changes the interpretation of
the fixed effects. Hence, it really depends on the research question
whether we treat a factor as fixed or random. There is no right or wrong
here.

Approximate confidence intervals can be obtained with `confint`

.

`confint(fit, oldNames = FALSE)`

```
## 2.5 % 97.5 %
## sd_(Intercept)|Worker:Machine 2.3528037 5.431503
## sd_(Intercept)|Worker 1.9514581 9.410584
## sigma 0.7759507 1.234966
## (Intercept) 47.3951611 57.315949
## MachineB 3.7380904 12.195243
## MachineC 9.6880904 18.145243
```

For example, a confidence interval for the expected value of the difference between machine \(A\) and \(B\) is given by \([3.74, 12.2]\).

We do the residual analysis as usual.

```
## Tukey-Anscombe plot:
plot(fit)
```

```
## QQ-plots:
par(mfrow = c(1, 3))
qqnorm(ranef(fit)$Worker[, 1], main = "Random effects of worker")
qqnorm(ranef(fit)$'Worker:Machine'[, 1], main = "Random interaction")
qqnorm(resid(fit), main = "Residuals")
```

```
## could also use the following code for generating QQ-plots:
## library(lattice)
## qqmath(ranef(fit))
```

The Tukey-Anscombe plot looks good. The QQ-plots could look better, however we don’t have a lot of observations.

Again, in order to better understand the mixed effects model we check what happens if we would fit a purely fixed effects model here.

```
<- aov(score ~ Machine * Worker, data = Machines)
fit.fixed summary(fit.fixed)
```

```
## Df Sum Sq Mean Sq F value Pr(>F)
## Machine 2 1755.3 877.6 949.17 <2e-16 ***
## Worker 5 1241.9 248.4 268.62 <2e-16 ***
## Machine:Worker 10 426.5 42.7 46.13 <2e-16 ***
## Residuals 36 33.3 0.9
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

The machine effect is much more significant. This is because the fixed
effects model makes a statement about the machine effect of these 6
*specific* workers and *not* about the population average (in the same
spirit as in the sire example above).

Some technical detail: We can actually get the correct p-value for the mixed effects model from the above fixed effects model output. If we divide the machine mean square by the mean square of the interaction effect we get 20.58. Hence, the p-value of machine is given by

`pf(20.58, df1 = 2, df2 = 10, lower.tail = FALSE)`

`## [1] 0.0002853299`

This is exctly the value as in the output of the mixed model from above (up to some rounding issues).

### 7.2.2 Example: Cheese Data

We continue with Example 12.2 from Oehlert (2010) (with some simulated data). A group of 10 “rural” and 10 “urban” raters rated 4 different cheese types \((A, B, C, D)\). Every rater got to eat two samples from the same cheese type in random order. Hence, we have a total of \(20 \cdot 4 \cdot 2 = 160\) observations.

```
<- read.table("http://stat.ethz.ch/~meier/teaching/data/cheese.dat", header = TRUE)
cheese "rater"] <- factor(cheese[, "rater"])
cheese[, str(cheese)
```

```
## 'data.frame': 160 obs. of 4 variables:
## $ cheese : chr "A" "A" "B" "B" ...
## $ rater : Factor w/ 10 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 2 2 ...
## $ background: chr "rural" "rural" "rural" "rural" ...
## $ y : int 67 66 51 53 75 70 68 66 76 76 ...
```

Note that the factor `rater`

has only 10 levels. We have to be careful
here and should not forget that rater is actually nested in background
(so, the raters labelled “1” in the urban and the rural group have
nothing in common).

We can easily visualize this data with an interaction plot. We use
the package `ggplot2`

to get a more appealing plot compared to the
function `interaction`

(`R`

code for interested readers only).

```
ggplot(cheese, aes(x = cheese, y = y, group = interaction(background, rater),
color = background)) + stat_summary(fun.y = mean, geom = "line")
```

Questions that could arise:

- Is there a background effect (urban / rural)?
- Is there a cheese type effect?
- Does the effect of cheese type depend on background (interaction)?
- How large is the variability between different raters regarding general cheese liking level?
- How large is the variability between different raters regarding cheese type preference?

Here, background and cheese type are fixed effects as we want to make a
statement about these *specific* levels. However, we treat rater as
random as we want the fixed effects to be population averages and
as we are interested in the variation between different raters.

We can use the model \[ Y_{ijkl} = \mu + \alpha_i + \beta_j + (\alpha\beta)_{ij} + \delta_{k(i)} + (\beta\delta)_{jk(i)} + \epsilon_{ijkl}, \] where

- \(\alpha_i\) is the (fixed) effect of background,
- \(\beta_j\) is the (fixed) effect of cheese type
- \((\alpha\beta)_{ij}\) is the corresponding interaction (fixed effect)
- \(\delta_{k(i)}\) is the random effect of rater (“general cheese liking level of rater \(k(i)\)”)
- \((\beta\delta)_{jk(i)}\) is the (random) interaction of rater and cheese type (“specific cheese type preference of rater \(k(i)\)”).

In this setup, there is an “average” (over the whole population) preference “profile” with respect to the 4 different cheese types but every rater can have its own random deviation of this “profile.” This deviation consists of a general shift \(\delta_{k(i)}\) (general cheese liking of a rater) and a rater specific preference \((\beta\delta)_{jk(i)}\) with respect to the 4 cheese types. This is visualized in the following picture (for urban raters only, \(i = 2\)). The situation is basically very similar to the previous example about machines. The solid line is the population average \((= \mu + \alpha_2 + \beta_j + (\alpha\beta)_{2j})\). The dashed line is a parallel shift including the rater specific general cheese liking level \((= \mu + \alpha_2 + \beta_j + (\alpha\beta)_{2j} + \delta_{k(2)})\). The dotted line in addition includes the rater specific cheese type preference \((\mu + \alpha_2 + \beta_j + (\alpha\beta)_{2j} + \delta_{k(2)} + (\beta\delta)_{jk(2)})\). Therefore, the fixed effects have to be interpreted as population averages.

Let us now try to fit this model in `R`

.

We want to have

- main effects and the interaction for the fixed effects of background
and cheese type:
`background * cheese`

- a random effect per rater (nested in background):
`(1 | rater:background)`

- a random effect per cheese type and rater (nested in background):
`(1 | rater:background:cheese)`

.

We always write `rater:background`

to get a unique rater ID. An
alternative would be to define another factor in the data set which
enumerates the raters from 1 to 20 (instead from 1 to 10).

Hence, the `lmer`

call looks as follows.

```
library(lmerTest)
<- lmer(y ~ background * cheese + (1 | rater:background) +
fit 1 | rater:background:cheese), data = cheese) (
```

We get an ANOVA table with p-values for the fixed effects with the
function `anova`

.

`anova(fit)`

```
## Type III Analysis of Variance Table with Satterthwaite's method
## Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
## background 82.13 82.13 1 18 8.2590 0.0101 *
## cheese 3144.66 1048.22 3 54 105.4149 <2e-16 ***
## background:cheese 15.42 5.14 3 54 0.5168 0.6725
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

We see that the interaction is not significant but there is a significant effect of background and cheese type. This is what we basically already observed in the interaction plot. There, the profiles were quite parallel, but raters with urban background rated higher on average than those with rural background. In addition, there was a clear difference between different cheese types.

Can we get an intuitive idea about the denominator degrees of freedom that are being used in these tests?

- The main effect of background can be thought of as a two-sample \(t\)-test with two groups having 10 observations each. Think of taking one (average) value per rater. Hence, we get \(2 \cdot 10 - 2 = 18\) degrees of freedom.
- As we allow for a rater specific cheese type preference, we have to check whether the effect of cheese is substantially larger than this rater specific variation. The rater specific variation can be thought of as the interaction between rater and cheese type. As rater is nested in background, rater has 9 degrees of freedom in each background group. Hence, the interaction has (\((9 + 9) \cdot 3 = 54\)) degrees of freedom.
- The same argument as above holds true for the interaction between background and cheese type.

Hence, if we want to get a more precise view about these population average effects, we need to increase the number of raters, which seems pretty natural.

Approximate confidence intervals for the individual coefficients of the
fixed effects and the variance components can again be obtained with
`confint`

.

`confint(fit, oldNames = FALSE)`

```
## 2.5 % 97.5 %
## sd_(Intercept)|rater:background:cheese 2.5977830 4.458964
## sd_(Intercept)|rater:background 3.5472552 7.425635
## sigma 2.7213541 3.713053
## (Intercept) 66.4559409 74.744059
## backgroundurban 2.5394154 14.260584
## cheeseB -20.6130977 -13.286902
## cheeseC 0.9869023 8.313098
## cheeseD -5.1630977 2.163098
## backgroundurban:cheeseB -7.6304024 2.730402
## backgroundurban:cheeseC -4.8304024 5.530402
## backgroundurban:cheeseD -7.1304024 3.230402
```