# Chapter 6 Block Designs

Quite often we already know that experimental units are *not*
homogeneous. Using a completely randomized design in such a situation
would still be a valid procedure. However, making explicit use of the
special “structure” of the experimental units typically helps reducing
variance (“getting a more precise picture”). In your introductory course
you have learned how to apply the paired \(t\)-test. It was used for
situations where multiple treatments were applied on the same “object”
or “subject”. Think for example of applying two treatments (in parallel)
on human beings. We know that people can be (very) different. Due to the
fact that we apply both treatments on the same subject, we get a “clear
picture” within every subject (the difference between the two
treatments). By taking the difference, the person-to-person variation
automatically disappears. We also say that we “block” on persons.

We will now extend this to the \(g > 2\) situation where \(g\) is the number of levels of our treatment factor (as in Chapter 3).

## 6.1 Randomized Complete Block Designs

Assume that we can divide our experimental units into \(r\) groups, also
known as **blocks**, containing \(g\)
experimental units each. Think for example of an agricultural experiment
at \(r\) different locations having \(g\) different plots of land each.
Hence, a block is given by a location and an experimental unit by a plot
of land.

The **randomized complete block design (RCBD)** uses a **restricted randomization scheme**:
*Within* every block (e.g., location), the treatments are randomized to
the experimental units (e.g., plots of land). The design is called
*complete* because we see the complete set of treatments within every
block (we will later also learn about *incomplete* block designs where
this is not the case anymore). Note that blocking already exists at the
time of randomization (and not only at the time of the analysis).

In the most basic form, we assume that we do **not** have replicates
within a block. This means that we only see every treatment once in each
block.

The analysis of a randomized complete block design is straightforward.
We treat the block factor as “another” factor in our model. As we have
no replicates within blocks, we can “only” fit a main effects model of
the form
\[
Y_{ij} = \mu + \alpha_i + \beta_j + \epsilon_{ij},
\]
where \(\alpha_i\)’s are the treatment effects and \(\beta_j\) are the
**block effects** with the usual side-constraints. In addition we have
the usual assumptions on the error term \(\epsilon_{ij}\). According to
this model we implicitly assume that blocks only cause additive shifts.

Let us now consider the hardness testing experiment from Montgomery (2012):

“For example, consider a hardness testing machine that presses a rod with a pointed tip into a metal specimen with a known force. By measuring the depth of the depression caused by the tip, the hardness of the specimen is determined. […] Suppose we wish to determine whether or not four different tips produce different readings on a hardness testing machine. The experimenter has decided to obtain four observations on Rockwell C-scale hardness for each tip. There is only one factor - tip type - and a completely randomized single-factor design would consist of randomly assigning each one of the \(4 \times 4 = 16\) runs to an experimental unit, that is, a metal coupon, and observing the hardness reading that results. Thus, 16 different metal test coupons would be required in this experiment, one for each run in the design. There is a potentially serious problem with a completely randomized experiment in this design situation. If the metal coupons differ slightly in their hardness, as might happen if they are taken from ingots that are produced in different heats, the experimental units (the coupons) will contribute to the variability observed in the hardness data. As a result, the experimental error will reflect both random error and variability between coupons. We would like to make the experimental error as small as possible; that is, we would like to remove the variability between coupons from the experimental error. A design that would accomplish this requires the experimenter to test each tip once on each of four coupons.”

This is a randomized complete block design. We now fit a main effects
only model to this data in `R`

and get the “usual” ANOVA table.

```
## Create data (skip if not interested) ####
tip <- factor(rep(1:4, each = 4))
coupon <- factor(rep(1:4, times = 4))
y <- c(9.3, 9.4, 9.6, 10,
9.4, 9.3, 9.8, 9.9,
9.2, 9.4, 9.5, 9.7,
9.7, 9.6, 10, 10.2)
hardness <- data.frame(y, tip, coupon)
## Analyze data ####
fit <- aov(y ~ coupon + tip, data = hardness)
summary(fit)
```

```
## Df Sum Sq Mean Sq F value Pr(>F)
## coupon 3 0.825 0.27500 30.94 4.52e-05 ***
## tip 3 0.385 0.12833 14.44 0.000871 ***
## Residuals 9 0.080 0.00889
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

We first focus on the p-value of `tip`

. Clearly, we can reject the null
hypothesis that there is no overall effect of tip type. Typically, we
are not inspecting the p-value of the block factor `coupon`

. There is
some historic debate why we should not do this, mainly because of the
fact that we did *not* randomize blocks because we already knew
beforehand that blocks would show an effect. However, we can do a quick
check to verify whether blocking was efficient or not. We would like the
block factor to explain a lot of variation, hence if the mean squares of
the block factor are much larger than the error mean square \(MS_E\) we
would conclude that blocking was efficient. Here, this is the case
as \(0.275 \gg 0.00889\).

Instead of a single treatment factor we can also have a factorial
treatment structure within every block. Think for example of a
two-factor factorial which we would model as `Y ~ Block + A * B`

. Here,
we could actually test the interaction between `A`

and `B`

even if every
level combination of `A`

and `B`

appears only *once* in every block. As
we have multiple blocks, we have multiple observations for every level
combination of `A`

and `B`

!

## 6.2 Multiple Block Factors

We can also block on more than one factor. A special case is the
so-called **Latin Square Design** where we
have *two* block factors and one treatment factor having \(g\) levels each
(yes, all!). This is *very* restrictive. Consider the following layout
where we have a block factor with levels \(R_1\) to \(R_4\) (“rows”),
another block factor with levels \(C_1\) to \(C_4\) (“columns”) and a
treatment factor with levels \(A\) to \(D\).

In a Latin Square Design each treatment (the Latin letters) appears
exactly once in each row and once in each column. We also say
it is a so-called **row-column designs**.

\(C_1\) | \(C_2\) | \(C_3\) | \(C_4\) | |
---|---|---|---|---|

\(R_1\) | \(A\) | \(B\) | \(C\) | \(D\) |

\(R_2\) | \(B\) | \(C\) | \(D\) | \(A\) |

\(R_3\) | \(C\) | \(D\) | \(A\) | \(B\) |

\(R_4\) | \(D\) | \(A\) | \(B\) | \(C\) |

We can create a Latin Square Design in `R`

for example with the
function `design.lsd`

of the add-on package `agricolae`

(de Mendiburu 2020).

```
## [,1] [,2] [,3] [,4]
## [1,] "A" "C" "B" "D"
## [2,] "C" "A" "D" "B"
## [3,] "D" "B" "A" "C"
## [4,] "B" "D" "C" "A"
```

A Latin Square blocks on both rows and columns *simultaneously*. We can
use the model
\[
Y_{ijk} = \mu + \alpha_i + \beta_j + \gamma_k + \epsilon_{ijk},
\]
to analyze data from a Latin square design. Here, \(\alpha_i\)’s are the
treatment effects and \(\beta_j\) and \(\gamma_k\) are the **block effects**
with the usual side-constraints.

The design is balanced having the effect that our usual estimators and
sums of squares are “working”. In `R`

we would use the model formula
`Y ~ Block1 + Block2 + Treat`

.

### Bibliography

de Mendiburu, Felipe. 2020. *Agricolae: Statistical Procedures for Agricultural Research*. http://tarwi.lamolina.edu.pe/~fmendiburu.

Montgomery, D. C. 2012. *Design and Analysis of Experiments*. John Wiley & Sons, Incorporated.