Chapter 5 Factorial Treatment Structure
In the completely randomized designs that we have seen so far, the \(g\) different treatments had no special “structure”. In practice, treatments are often combinations of the levels of two or more factors. Think for example of a plant experiment using combinations of light exposure and fertilizer. We call this a factorial treatment structure. If we see all possible combinations of the levels of two (or more) factors, we call them crossed. An illustration of two crossed factors can be found in Table 5.1.
exposure / fertilizer  \(A\)  \(B\) 

low  x  x 
medium  x  x 
high  x  x 
Let us have a look at the cheddar
data set (Example 8.6 in
Oehlert (2010)). An experiment was performed how intentionally added
bacteria affect cheese quality. Two strains of bacteria, “R50#10” and
“R21#2” were investigated. None (control treatment), both or one
of the strains were being added (R50
with levels "yes"
and "no"
and R21
with levels "yes"
and "no"
). Total free amino acids was
measured as response (acids
). The data is available in Table
5.2.
## Create data (skip if not interested) ####
acids < c(1.697, 1.601, 1.830,
2.032, 2.017, 2.409,
2.211, 1.673, 1.973,
2.091, 2.255, 2.987)
R50 < rep(c("no", "yes", "no", "yes"), each = 3)
R21 < rep(c("no", "no", "yes", "yes"), each = 3)
cheddar < data.frame(R50, R21, acids)




Clearly, R50
and R21
are crossed factors. We have 3 observations for
every combination of R50
and R21
. With R
we can easily count the
observations with the function xtabs
. The formula ~ R50 + R21
in
xtabs
reads as “count the number of observations for every combination
of the levels of the two factors R50
and R21
”.
## R21
## R50 no yes
## no 3 3
## yes 3 3
With a factorial treatment structure we typically have questions about both factors and/or their possible interplay:
 “Does the effect of adding R50#10 depend on whether we have added R21#10 or not (or vice versa)?”
If the effects do not depend on the other factor, we could also ask:
“What is the effect of adding R50#10 on the expected value of the total free amino acids?”
“What is the effect of adding R21#10 on the expected value of the total free amino acids?”
We can visualize such kind of data with a so called interaction
plot using the function interaction.plot
.
For every combination of R50
("yes" / "no"
) and R21
("yes" / "no"
) we calculate the average value of the response. We use R50
on
the xaxis (the first argument, also called x.factor
). In addition,
settings corresponding to the same level of R21
are connected with
lines (argument trace.factor
). The role of R50
and R21
can of
course also be interchanged. Hence, what factor we use as x.factor
or as tace.factor
is a question of personal preference.
## elegant way, using the function "with"
with(cheddar, interaction.plot(x.factor = R50, trace.factor = R21, response = acids))
## standard way: interaction.plot(x.factor = cheddar$R50, trace.factor = cheddar$R21,
## response = cheddar$acids)
We see that the lines are (almost) parallel here. This means that it
looks like the effect of R50
does not depend on the level of R21
(and vice versa). Adding “R50#10” seems to increase total free amino
acids by about 0.5, while adding “R21#10” has an effect of about 0.3 on
total free amino acids. One disadvantage of an interaction plot is the
fact that it only plots averages and we do not see the underlying
variation of the indidivual observations anymore. Hence, we cannot
easily conclude that the effects which we observe in the plot are
statistically significant.
A more sophisticated version could be plotted with the addon package
ggplot2
(Wickham et al. 2020) where we plot both the individual observations
and the lines (for interested readers only).
library(ggplot2)
ggplot(cheddar, aes(x = R50, y = acids, color = R21)) + geom_point() +
stat_summary(fun.y = mean, geom = "line", aes(group = R21), size = 1) + theme_bw()
With this plot we immediately see that the effects are rather small compared to the variation of the individual observations. If we have many observations we could also use boxplots for every setting instead of the individual observations.
Let us now set up a parametric model which allows us to make a statement about the uncertainty of the estimated effects.
5.1 TwoWay ANOVA Model
We assume a general setup with a factor \(A\) with \(a\) levels, a factor \(B\) with \(b\) levels and \(n\) replicates for every combination of \(A\) and \(B\) (a balanced design). Hence, we have a total of \(N = a \cdot b \cdot n\) observations. In the previous example we had \(a = 2\), \(b = 2\), \(n = 3\) and therefore \(N = 12\).
The twoway ANOVA model with interaction is \[ Y_{ijk} = \mu + \alpha_i + \beta_j + (\alpha \beta)_{ij} + \epsilon_{ijk} \] where
 \(\alpha_i\) is the main effect of factor \(A\) at level \(i\)
 \(\beta_j\) is the main effect of factor \(B\) at level \(j\)
 \((\alpha \beta)_{ij}\) is the interaction effect between \(A\) and \(B\) for the level combination \(i,j\) (it is not the product \(\alpha_i \beta_j\))
 \(\epsilon_{ijk}\) are i.i.d. \(N(0, \sigma^2)\) errors.
Typically, we use sumtozero side constraints, i.e. \(\sum_{i=1}^a \alpha_i = 0\), \(\sum_{j=1}^b \beta_j = 0\), \(\sum_{i=1}^a (\alpha \beta)_{ij} = 0\) and \(\sum_{j=1}^b (\alpha \beta)_{ij} = 0\). Other choices are also possible. Hence, the main effects have \(a  1\) and \(b  1\) degrees of freedom, respectively. The degrees of freedom of the interaction term are \((a 1)(b  1)\) which is the product of the degrees of freedom of the involved main effects.
The effects can be interpreted as follows: Think of main effects as “average effects” on the expected value of the response when changing the level of a factor (keeping the other factor fixed). The interaction effect can be thought of as a correction factor to the main effects model. The interaction effect tells us how much an effect of a certain factor changes, when we “switch” the level of the other factor. Strictly speaking, interpretation depends on the sideconstraint that we apply. An illustration of the model can be found in Figure 5.1. It is nothing else than the “theoretical” version of the interaction plot where we have the expected value instead of the sample average on the yaxis.
A model without interaction term is additive. This means that the effect of \(A\) does not depend on the level of \(B\) (and vice versa), “it is always the same, no matter what the level of the other factor”. An illustration can be found in Figure 5.2, where we see that the lines are parallel. This means, changing \(B\) from level 1 to level 2 has always the same effect (here: 2). Similarly, the effect of changing \(A\) does not depend on the level of \(B\) (here: 1 and 3).
5.1.1 Parameter Estimation
As before, we estimate parameters using the principles of least squares. Using sumtozero side constraints we get
Parameter  Estimate 

\(\mu\)  \(\widehat{\mu} = \overline{y}_{\cdot\cdot\cdot}\) 
\(\alpha_i\)  \(\widehat{\alpha}_i = \overline{y}_{i\cdot\cdot}  \overline{y}_{\cdot\cdot\cdot}\) 
\(\beta_j\)  \(\widehat{\beta}_j = \overline{y}_{\cdot j \cdot}  \overline{y}_{\cdot\cdot\cdot}\) 
\((\alpha \beta)_{ij}\)  \(\widehat{(\alpha \beta)}_{ij} = \overline{y}_{ij\cdot}  \widehat{\mu}  \widehat{\alpha}_i  \widehat{\beta}_j\) 
As before: if we replace an index with a dot, we take the mean (or the sum) over that “dimension”. For example, \(\overline{y}_{1\cdot\cdot}\) is the mean over all observations \(y_{1jk}\), \(j = 1, \ldots, b\) and \(k = 1, \ldots, n\). Hence, we estimate the expected value of the response \(Y_{ijk}\) for \(A\) at level \(i\) and \(B\) at level \(j\) as \[ \widehat{\mu} + \widehat{\alpha}_i + \widehat{\beta}_j + \widehat{(\alpha \beta)}_{ij} = \overline{y}_{i j \cdot} \] which is nothing else than the mean of the observations in the corresponding “cell” (which is no surprise). Note however that we “untangled” the effect with respect to the two main effects and the interaction.
If we carefully inspect the parameter estimates from above, we see that for the main effects, we use an estimate that completely ignores the other factor. We basically treat the problem as a oneway ANOVA model. This is a consequence of the balanced design. For all levels of \(A\) we have the “same population” of \(B\) settings. Hence, if we compare \(\overline{y}_{1\cdot\cdot}\) (think of taking the average over the first row) with \(\overline{y}_{2\cdot\cdot}\) (average over second row) the effect is only due to changing \(A\) from level 1 to level 2. In regression terminology we would call this an orthogonal design.
In R
we fit the model using the function aov
. The model formula for
the model including the interaction is acids ~ R50 * R21
which is
equivalent to acids ~ R50 + R21 + R50:R21
. This means
whenever we combine two predictors with *
, the corresponding main
effects are automatically included (which is typically a reasonable
approach). The formula acids ~ R50 + R21
would fit a main effects model.
options(contrasts = c("contr.sum", "contr.poly"))
fit.cheddar < aov(acids ~ R50 * R21, data = cheddar)
coef(fit.cheddar)
## (Intercept) R501 R211 R501:R211
## 2.06466667 0.23383333 0.13366667 0.01216667
## Full coefficients are
##
## (Intercept): 2.064667
## R50: 0.2338333
## R21: 0.1336667
## R50:R21: 0.01216667
5.1.2 Tests
As for the oneway ANOVA case, the total sum of squares \(SS_T\) can be partioned into different sources \[ SS_T = SS_A + SS_B + SS_{AB} + SS_E, \] where
Source  Sum of squares  Comment 

\(A\)  \(SS_A = \sum_{i=1}^a b n (\widehat{\alpha}_i)^2\)  “between rows” 
\(B\)  \(SS_B = \sum_{j=1}^b a n (\widehat{\beta}_j)^2\)  “between columns” 
\(AB\)  \(SS_{AB} = \sum_{i=1}^a \sum_{j=1}^b n (\widehat{\alpha\beta})_{ij}^2\)  “correction” 
Error  \(SS_E = \sum_{i=1}^a \sum_{j=1}^b \sum_{k=1}^n (y_{ijk}  \overline{y}_{ij\cdot})^2\)  “within cells” 
Total  \(SS_T = \sum_{i=1}^a \sum_{j=1}^b \sum_{k=1}^n (y_{ijk}  \overline{y}_{\cdot\cdot\cdot})^2\)  “total” 
The degrees of freedom of the error term is nothing else than the degrees of freedom of the total sum of squares (\(abn  1\), number of observations minus 1) minus the degrees of freedom of the other effects.
We can construct an ANOVA table based on this information
Source  df  Sum of squares  Mean squares  \(F\)ratio 

\(A\)  \(a  1\)  \(SS_A\)  \(MS_A = \frac{SS_A}{a1}\)  \(\frac{MS_A}{MS_E}\) 
\(B\)  \(b  1\)  \(SS_B\)  \(MS_B = \frac{SS_B}{b1}\)  \(\frac{MS_B}{MS_E}\) 
\(AB\)  \((a1)(b1)\)  \(SS_{AB}\)  \(MS_{AB} = \frac{SS_{AB}}{(a1)(b1)}\)  \(\frac{MS_{AB}}{MS_E}\) 
Error  \(ab(n1)\)  \(SS_E\)  \(MS_E = \frac{SS_E}{ab(n1)}\) 
As before we can construct tests based on the corresponding \(F\)distributions.
 interaction
 \(H_0: (\alpha \beta)_{ij} = 0\) for all \(i,j\)
 \(H_A\): at least one \((\alpha \beta)_{ij} \neq 0\)
 Under \(H_0\): \(\frac{MS_{AB}}{MS_E} \sim F_{(a1)(b1), \, ab(n1)}\)
 main effect A
 \(H_0: \alpha_i = 0\) for all \(i\)
 \(H_A\): at least one \(\alpha_i \neq 0\)
 Under \(H_0\): \(\frac{MS_A}{MS_E} \sim F_{a1, \, ab(n1)}\)
 main effect B
 \(H_0: \beta_j = 0\) for all \(j\)
 \(H_A\): at least one \(\beta_j \neq 0\)
 Under \(H_0\): \(\frac{MS_B}{MS_E} \sim F_{b1, \, ab(n1)}\)
Again, we get the ANOVA table and the corresponding pvalues with the
summary
function. We read the following R
output from bottom to
top. This means that we first check whether we need an interaction
term or not. If there is no evidence of interaction, we continue
with the inspection of the main effects. If there is evidence of
interaction, we start analyzing the effect in more
detail (see next example).
## Df Sum Sq Mean Sq F value Pr(>F)
## R50 1 0.6561 0.6561 7.234 0.0275 *
## R21 1 0.2144 0.2144 2.364 0.1627
## R50:R21 1 0.0018 0.0018 0.020 0.8922
## Residuals 8 0.7257 0.0907
## 
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here, there is no evidence of an interaction between R50
and
R21
. Only the main effect of R50
is significant. This finding is
consistent with what we have observed in the interaction plot. There,
the two lines were (almost) parallel. Adding R50#10 (significantly)
increases the free amino acids. For R21#10 the interaction plot
suggests a larger amount of free amino acids if we add it. However, the
effect is not significant. Hence it could very well have happened just
by chance.
Let us consider another example which is about fecundity of limpets
(some aquatic snails) (Quinn and Keough (2010)). We have a factor season
with
levels spring
and summer
and a factor density
with 3 levels 6
,
12
and 24
. Density is the number of limpets “sitting” under the same
wire mesh guard. For every combination of density
and season
we have
3 wire mesh guards whose average number of eggs we record. This will be
our response variable. Here, an experimental unit is given by a wire
mesh guard and not an individual limpet. We interpret density as a
factor and not a continuous variable, as this will give us more
flexibility.
## Create data (skip if not interested) ####
season < factor(rep(c("Spring", "Summer"), each = 6))
density < factor(rep(c(6, 12, 24), each = 3))
y < c(1.17, 0.50, 1.67, 1.50, 0.83, 1.00, 0.67, 0.67, 0.75,
4.00, 3.83, 3.83, 3.33, 2.58, 2.75, 2.54, 1.83, 1.63)
design < expand.grid(density = factor(c(6, 12, 24)), season = c("Spring", "Summer"))
snails < data.frame(design[rep(1:nrow(design), each = 3), ], y = y)
## Have a look at interaction plot ####
with(snails, interaction.plot(x.factor = density, trace.factor = season, response = y))
The interaction plot suggests that there might be an interaction between
season
and density
as the effect of density
is less pronounced in
spring
. Let us check whether this is significant or not.
## Df Sum Sq Mean Sq F value Pr(>F)
## season 1 17.131 17.131 119.373 1.36e07 ***
## density 2 4.001 2.001 13.940 0.000742 ***
## season:density 2 1.689 0.845 5.885 0.016552 *
## Residuals 12 1.722 0.144
## 
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the output we conclude that the interaction is significant, i.e. the effect of density is season specific (or the other way round). In such a situation we would (typically) stop inspecting whether the main effects are significant or not.
We could start inspecting some specific questions using appropriate
contrasts, e.g., what is the difference between density 6
and 12
in Summer
? From a technical point of view, this works best if we
construct a so called hyperfactor which has as levels all
possible combinations of season
and density
. In R
we can easily
do this with the function interaction
which “glues together” two
factors.
snails[, "combined"] < interaction(snails[, "season"], snails[, "density"])
levels(snails[, "combined"])
## [1] "Spring.6" "Summer.6" "Spring.12" "Summer.12" "Spring.24" "Summer.24"
fit.comb < aov(y ~ combined, data = snails)
library(multcomp)
## 12 vs. 6 in summer
fit.glht < glht(fit.comb, linfct = mcp(combined = c(0, 1, 0, 1, 0, 0)))
confint(fit.glht)
##
## Simultaneous Confidence Intervals
##
## Multiple Comparisons of Means: Userdefined Contrasts
##
##
## Fit: aov(formula = y ~ combined, data = snails)
##
## Quantile = 2.1788
## 95% familywise confidence level
##
##
## Linear Hypotheses:
## Estimate lwr upr
## 1 == 0 1.0000 1.6739 0.3261
This means that we would conclude that in summer, the difference between density 12 and 6 lies in the interval \([1.6739, 0.3261]\).
5.1.3 Single Replicates
If we only have a single observation in each “cell” (\(n = 1\)) we cannot do statistical inference anymore with a model including the interaction as we have no idea of the experimental error (for every treatment combination we only have one observation). However, we can still fit a main effects only model. If the data generating mechanism actually contains an interaction, we are fitting a wrong model. The consequences are that the corresponding tests will be too conservative, meaning pvalues will be too large. This is not a problem as the type 1 error rate is still controlled. We “just” lose power.
An alternative would be to use a Tukey onedegree of freedom interaction which uses only one additional parameter for the interaction, see Oehlert (2010).
Quite often, we can get rid of interactions if we look at the problem on a different scale, i.e., if we transform the response appropriately. A famous example is the logarithm. Effects that are multiplicative on the original scale become additive on the logscale, i.e., no interaction is needed on the logscale.
5.1.4 Checking Model Assumptions
As before we use the QQplot and the TukeyAnscombe plot. We use the previous data as example.
If in doubt whether this QQplot is still OK, we can always compare it to a couple of “nice” (simulated) QQplots of the same sample size using the following command (give it a try!)
Our QQplot looks like one from simulated data. The TukeyAnscombe plot looks OK too:
The careful reader might wonder why we only see 5 “stripes” instead of
6. The reason is that the two settings Spring, 6
and Spring, 12
have basically the same predicted value \(\widehat{\mu}_{ij}\) (see also
the interaction plot).
5.1.5 Unbalanced Data
We started with the (very) strong assumption that our data is balanced, i.e., for every treatment “setting” we have the same number of replicates. This assumptions made our life “easy” in the sense that we could (uniquely) decompose the variability into the different sources and we could estimate the parameters of the coefficients of a factor by ignoring the other factors. In practice, data is typically not balanced. Think for example of a situation where something goes wrong with an experiment in some of the settings. With unbalanced data, the aforementioned properties are (typically) destroyed. We still use least squares to estimate the parameters. The estimates willl look more complicated. However, this is not a problem these days and daily business in most regression problems anyway. More of a problem is the fact that the sum of squares cannot be uniquely partitioned into different sources anymore. This means that for some part of the variation of the data it is not clear to what source we should attribute it.
We “solve” this problem by using an appropriate model comparison approach. As seen before, see for example Equation (3.5), the sum of squares of a factor can be interpreted as the reduction of residual sum of squares when adding the corresponding factor to the model. In the balanced situation it would not matter whether the other factors are in the model or not: the reduction is always the same. Unfortunately, for unbalanced data this property does not hold anymore.
We use the following notation: With SS(B  1, A)
we denote the
reduction in residual sum of squares when comparing the model
(1, A, B)
(= y ~ A + B
) with (1, A)
(= y ~ A
). The 1
is the
overall mean \(\mu\) (that does not appear explicitly in the R
model
formula). Interpretation of the corresponding test is as follows:
“Do we need factor B
in the model if we already have factor A
(or
after having controlled for factor A
)?”. There are different “ways”
or “types” of model comparison approaches.
 Type I (sequential): Sequentially build up model (depends on the
“ordering” of the model terms!)
SS(A  1)
SS(B  1, A)
SS(AB  1, A, B)
 Type II (hierarchical): Control for the influence of the largest
hierarchical model not including the term of interest.
SS(A  1, B)
SS(B  1, A)
SS(AB  1, A, B)
 Type III (fully adjusted): Control for all other terms.
SS(A  1, B, AB)
SS(B  1, A, AB)
SS(AB  1, A, B)
Type I is what you get with summary
of an aov
object. Hence, for unbalanced
data you get different results whether you write y ~ A * B
or y ~ B * A
, see
also R FAQ 7.18.
For type II we can either use the function Anova
in the addon package
car
or we could compare the appropriate models with the function
anova
ourselves. For type III we can use the command drop1
(we have
to be careful that we set the contrast option to contr.sum
in this
special situation for technical reasons).
Typically, we take \(MS_E\) from the full model (including all terms) as the estimate for the error standard deviation to construct the corresponding \(F\)tests.
Note that interpretation of the test depends on the type that we use. This can be seen from the following remarks.
Some Technical Remarks
We use the slightly different notation \[ Y_{ijk} = \mu_{ij} + \epsilon_{ijk} \] where \(\mu_{ij} = \mu + \alpha_i + \beta_j + (\alpha \beta)_{ij}\). This means we use a cell means model with expected values \(\mu_{ij}\). We denote the corresponding sample sizes for the level combination \((i, j)\) with \(n_{ij} > 0\). It can be shown (Speed, Hocking, and Hackney (1978)) that both type I and type II sum of squares lead to tests that involve null hypotheses that depend on the sample sizes of the individual cells (!).
More precisely, for example for the main effect \(A\) we have:
For type I sum of squares we implicitly test the null hypothesis \[ H_0: \sum_{l = 1}^b \frac{n_{1l}}{n_{1\cdot}} \mu_{1l} = \cdots = \sum_{l = 1}^b \frac{n_{al}}{n_{a\cdot}} \mu_{al} \] where \(n_{i\cdot}\) is the sum over the corresponding \(n_{ij}\)’s. The above equation simply means that across all the rows, a weighted mean of the \(\mu_{ij}\)’s is the same. Note that the weights depend on the observed sample sizes \(n_{ij}\). Therefore, the observed sample sizes dictate our null hypothesis! Interpretation of such a research question is rather difficult.
Similarly for type II sum of squares we get a (more complicated) formula involving \(n_{ij}\)’s.
For type III sum of squares we have \[ H_0: \overline{\mu}_{1\cdot} = \cdots = \overline{\mu}_{a\cdot} \] where \(\overline{\mu}_{i\cdot} = \frac{1}{b} \sum_{l = 1}^b \mu_{il}\) which is the unweighted mean of the corresponding \(\mu_{ij}\)’s. This can be reformulated as \[ H_0: \alpha_1 = \ldots = \alpha_a \] This is what we have used so far, and more importantly, does not depend on \(n_{ij}\).
We can also observe that for a balanced design we test the same null hypothesis, no matter what type we use.
Unbalanced Data Example
We have a look at some (simulated) data about a sports experiment using
a factorial design with the factor “training method” (method
, with levels
conventional
and new
) and the factor “energy drink” (drink
,
with levels A
and B
). Response was running time in seconds for a specific track.
running < read.table("http://stat.ethz.ch/~meier/teaching/data/running.dat",
header = TRUE)
str(running)
## 'data.frame': 70 obs. of 3 variables:
## $ method: chr "new" "new" "new" "new" ...
## $ drink : chr "A" "A" "A" "A" ...
## $ y : num 40.6 49.7 42.1 42.2 39 44.2 44.1 43.1 44.7 46.3 ...
## drink
## method A B
## conventional 10 40
## new 10 10
Clearly, this is an unbalanced data set. The output of xtabs
gives
us all the \(n_{ij}\)’s.
We use contr.sum
, otherwise type III sum of squares will be wrong
(technical issue).
## Df Sum Sq Mean Sq F value Pr(>F)
## method 1 2024.0 2024.0 263.719 < 2e16 ***
## drink 1 455.2 455.2 59.316 9.05e11 ***
## method:drink 1 29.1 29.1 3.791 0.0558 .
## Residuals 66 506.5 7.7
## 
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
For example, the row method
tests the null hypothesis
\[
H_0: 0.2 \cdot \mu_{11} + 0.8 \cdot \mu_{12} = 0.5 \cdot \mu_{21} +
0.5 \cdot \mu_{22}
\]
(because 80% of conventional
use drink B
).
Now we change the order of the factors in the model formula.
## Df Sum Sq Mean Sq F value Pr(>F)
## drink 1 1145.9 1145.9 149.299 <2e16 ***
## method 1 1333.4 1333.4 173.737 <2e16 ***
## drink:method 1 29.1 29.1 3.791 0.0558 .
## Residuals 66 506.5 7.7
## 
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can see that the sum of squares depend on the ordering of the model terms in the model formula if we use a type I approach. Hence, we also get different \(F\)ratios and different pvalues. However, the pvalues of the main effects stay very small here, no matter what “ordering” we use.
We can easily get type II sum of squares with the addon package car
.
## Anova Table (Type II tests)
##
## Response: y
## Sum Sq Df F value Pr(>F)
## method 1333.41 1 173.7365 < 2.2e16 ***
## drink 455.25 1 59.3164 9.053e11 ***
## method:drink 29.09 1 3.7908 0.05579 .
## Residuals 506.54 66
## 
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We could also use the Anova
function for type III sum of squares, but
drop1
will do the job too.
## Type III sum of squares
drop1(fit, scope = ~., test = "F", data = running)
## or: Anova(fit, type = "III", data = running)
## Single term deletions
##
## Model:
## y ~ method * drink
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 506.54 146.54
## method 1 1352.40 1858.95 235.55 176.2110 < 2.2e16 ***
## drink 1 484.24 990.78 191.50 63.0935 3.347e11 ***
## method:drink 1 29.09 535.64 148.45 3.7908 0.05579 .
## 
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Now the row method
tests the null hypothesis
\[
H_0: \frac{1}{2} \cdot \mu_{11} + \frac{1}{2} \cdot \mu_{12} =
\frac{1}{2} \cdot \mu_{21} + \frac{1}{2} \cdot \mu_{22},
\]
which is nothing else than saying that the “rowaverages” of the
\(\mu_{ij}\)’s is constant. Here, the actual sample sizes don’t play a
role anymore.
5.2 More Than Two Factors
We can easily extend the model to more than two factors. If we have
three factors A
, B
and C
, we have \(3\) main effects, \(3 \cdot 2 / 2 = 3\) twoway interactions and one socalled threeway interaction. We
omit the mathematical details and work directly with the corresponding
R
Code. In R
we would write y ~ A * B * C
for a model including
all main effects, all twoway interactions and a threeway interaction.
An equivalent formulation would be y ~ A + B + C + A:B + A:C + B:C + A:B:C
.
Main effects are interpreted as average effects, twoway interaction effects are interpreted as deviations from the main effects model (i.e., the correction for an effect that depends on the level of the other factor) and the threeway interaction is an adjustment of the twoway interaction depending on the third factor (quite hard to interpret in practice).
In a balanced design, estimation is as for a twoway ANOVA model by ignoring the remaining factor. To estimate the threeway interaction, we simply subtract all estimates of the main effects and the twoway interactions. The degrees of freedom for the threeway interaction is the product of the degrees of freedom of all involved factors.
Consider Problem 8.6 in Oehlert (2010): “A study looked into the management of various tropical grasses for improved production, measured as dry matter yield in hundreds of pounds per acre over a 54week study period. The management variables were height of cut (1, 3, or 6 inches), the cutting interval (1, 3, 6, or 9 weeks), and amount of nitrogen fertilizer (0, 8, 16, or 32 hundred pounds of ammonium sulfate per acre per year). Fortyeight plots were assigned in completely randomized fashion to the 48 factorlevel combinations.”
We first read in the data,
make sure that all variables are factors in R
and set the labels of
the levels accordingly.
grasses < read.table("http://users.stat.umn.edu/~gary/book/fcdae.data/pr8.6",
header = TRUE)
grasses[, "ht"] < factor(grasses[, "ht"], labels = c("1", "3", "6"))
grasses[, "fert"] < factor(grasses[, "fert"], labels = c("0", "8", "16", "32"))
grasses[, "int"] < factor(grasses[, "int"], labels = c("1", "3", "6", "9"))
Let us check whether this worked as intended.
## 'data.frame': 48 obs. of 4 variables:
## $ ht : Factor w/ 3 levels "1","3","6": 1 1 1 1 1 1 1 1 1 1 ...
## $ fert: Factor w/ 4 levels "0","8","16","32": 1 1 1 1 2 2 2 2 3 3 ...
## $ int : Factor w/ 4 levels "1","3","6","9": 1 2 3 4 1 2 3 4 1 2 ...
## $ y : num 74.1 65.4 96.7 147.1 87.4 ...
We can visualize the data with (multiple) interaction plots. For example
We only have one observation for every combination of ht
, fert
and
int
.
## , , int = 1
##
## fert
## ht 0 8 16 32
## 1 1 1 1 1
## 3 1 1 1 1
## 6 1 1 1 1
##
## , , int = 3
##
## fert
## ht 0 8 16 32
## 1 1 1 1 1
## 3 1 1 1 1
## 6 1 1 1 1
##
## , , int = 6
##
## fert
## ht 0 8 16 32
## 1 1 1 1 1
## 3 1 1 1 1
## 6 1 1 1 1
##
## , , int = 9
##
## fert
## ht 0 8 16 32
## 1 1 1 1 1
## 3 1 1 1 1
## 6 1 1 1 1
Hence, we cannot do statistical inference for the full model
(including the threeway interaction) as we have no replicates. However,
we can still fit a model including all twoway interactions. In R
,
such a model can be easily specified with the formula y ~ .^2
where
.
stands for “all terms”.
fit.grasses < aov(y ~ .^2, data = grasses)
drop1(fit.grasses, scope = ~ ., test = "F") ## could also use summary (balanced design)
## Single term deletions
##
## Model:
## y ~ (ht + fert + int)^2
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 3155 260.90
## ht 2 29 3184 257.34 0.0830 0.92068
## fert 3 42072 45226 382.72 80.0153 1.334e10 ***
## int 3 73887 77042 408.28 140.5241 1.120e12 ***
## ht:fert 6 406 3561 254.71 0.3860 0.87835
## ht:int 6 3005 6160 281.02 2.8578 0.03903 *
## fert:int 9 5352 8506 290.51 3.3927 0.01313 *
## 
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
There is evidence that both the effects of “height of cut” (ht
)
and “amount of fertilizer” (fert
) depend on the level of “cutting
interval” (int
), as both the ht:int
and fert:int
interactions
are significant.
The residuals look OK for this data set.
5.2.1 FAQ
When can I test a certain interaction?
If multiple observations are available for each combination of the levels of the involved factors we can test the interaction. We do not have to consider the remaining factors in the model. For example, if we have three factors \(A\), \(B\) and \(C\) we can do statistical inference for the interaction between \(A\) and \(B\), if we have multiple observations for each combination of the levels of \(A\) and \(B\) even if they correspond to different levels of \(C\).
I have a factor with 3 levels and one with 4 levels and 2 replicates for each level combination. Is this enough data for a twoway ANOVA?
This depends on the degrees of freedom of the error term. We have a total of \(3 \cdot 4 \cdot 2  1 = 23\) degrees of freedom. The main effects need \(2\) and \(3\) degrees of freedom, respectively, and the interaction \(2 \cdot 3 = 6\) degrees of freedom. Hence, the error has \(23  2  3  6 = 12\) degrees of freedom. According to the classical rule, this is enough. For a more thorough analysis, a detailed power calculation should be performed.
Bibliography
Oehlert, Gary. 2010. A First Course in Design and Analysis of Experiments. http://users.stat.umn.edu/~gary/book/fcdae.pdf.
Quinn, Gerry P., and Michael J. Keough. 2010. Experimental Design and Data Analysis for Biologists. Cambridge University Press.
Speed, M, R Hocking, and P Hackney. 1978. “Methods of Analysis of Linear Models with Unbalanced Data.” Journal of the American Statistical Association 73 (361): 105–12. https://doi.org/10.1080/01621459.1978.10480012.
Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, and Dewey Dunnington. 2020. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://CRAN.Rproject.org/package=ggplot2.