\[ \DeclareMathOperator{\argmin}{argmin} \DeclareMathOperator{\Var}{Var} \DeclareMathOperator{\Cor}{Cor} \]

8 Incomplete Block Designs

8.1 Introduction

In this chapter we are going to learn more about block designs. The block designs in Chapter 5 were complete, meaning that every block contained all treatments. In practice, this is not always possible. For example, the actual physical size of a block might be too small. There are also situations where it is not advisable to have too many treatments in each block. If we do a food tasting experiment, we typically want to restrict the number of different recipes (treatments) we want an individual rater to taste. Otherwise, the response will get unreliable, for example, because the raters might get bored after some time.

If we cannot “afford” to do a complete design, we get a so-called incomplete block design (IBD). We have to decide what subset of treatments we use in an individual block. If we do not make a wise decision, it can be that the design is flawed in the sense that certain quantities are not estimable anymore (e.g., some treatment differences). This would be a worst case scenario.

Consider the design in Table 8.1 with treatments \(A\) to \(F\) and blocks \(1\) to \(6\) (each column corresponds to a block). For example, on block 5 we apply the two treatments \(D\) and \(F\). Think for example of treatments as different recipes and block as different raters.

TABLE 8.1: Example of an incomplete block design. Blocks are denoted by different columns, treatments by the letters \(A\) to \(F\).
1 2 3 4 5 6
\(A\) \(A\) \(B\) \(D\) \(D\) \(E\)
\(B\) \(C\) \(C\) \(E\) \(F\) \(F\)

First, we create a fictional data set having the same structure as in Table 8.1.

set.seed(25) ## for reproducibility
dat <- data.frame(block = factor(rep(1:6, times = 2)),
                  treat = factor(c("A", "A", "B", "D", "D", "E",
                                   "B", "C", "C", "E", "F", "F")),
                  ## block eff. + error term, no signal wrt. treat
                  y = round(rep(rnorm(6), times = 2) + 
                              rnorm(12), 1))
str(dat)
## 'data.frame':    12 obs. of  3 variables:
##  $ block: Factor w/ 6 levels "1","2","3","4",..: 1 2 3 4 5 6 ...
##  $ treat: Factor w/ 6 levels "A","B","C","D",..: 1 1 2 4 4 5 ...
##  $ y    : num  1.6 1.7 0.9 0.7 -0.8 3.4 ...

We can get an overview of the used combinations with xtabs.

xtabs(~ treat + block, data = dat)
##      block
## treat 1 2 3 4 5 6
##     A 1 1 0 0 0 0
##     B 1 0 1 0 0 0
##     C 0 1 1 0 0 0
##     D 0 0 0 1 1 0
##     E 0 0 0 1 0 1
##     F 0 0 0 0 1 1

What happens if we fit our standard main effects model here?

options(contrasts = c("contr.treatment", "contr.poly"))
fit.ibd <- aov(y ~ block + treat, data = dat)
dummy.coef(fit.ibd)
## Full coefficients are 
##                                              
## (Intercept):     1.1                         
## block:             1    2    3    4    5    6
##                  0.0  1.1  0.5 -2.9 -4.0  1.1
## treat:             A    B    C    D    E    F
##                  0.0 -1.2 -1.8  2.3  1.0  0.0

At first sight, it looks like we were able to fit the specified model. However, it is strange that two of the treatment coefficients are set to zero and not only one as usual when using contr.treatment.

What happens if we want to investigate a certain contrast, say treatment \(A\) vs. \(F\)?

library(multcomp)
summary(glht(fit.ibd, linfct = mcp(treat = c(1, 0, 0, 0, 0, -1))))

We get an error message (not shown) which says that some linear functions are not estimable. In fact, we cannot estimate the difference between the treatments \(A\) and \(F\) with this data set, at least not with a fixed effects model. The reason lies in the structure of the incomplete design. It is an example of a so-called disconnected design. We have the treatment set \(\{A, B, C\}\) in the first three blocks and \(\{D, E, F\}\) in the last three blocks (check the output of xtabs above). As the two sets are disjoint, we have no idea about any differences involving treatments from the two sets, because we never see a pair together in the same block. This is also causing the strange output in dummy.coef above. Because of this dependency between treatment and block, we cannot distinguish block and treatment effects anymore; the effects are confounded. Here, R automatically drops one of the treatment levels from the model, which might be very dangerous if you are not carefully inspecting the output.

Hence, performing such a disconnected design was obviously not a good idea from a design of experiment point of view, as we cannot estimate any differences involving treatments from the two disjoint sets. Still, situations like these happen in practice: “Let’s do treatments \(A\), \(B\) on the first few days and then switch to \(C\), \(D\) because it is easier …”. If day is a block factor, this design will be disconnected.

Intuitively, we should have a good “mix” of treatments in each block.

8.2 Balanced Incomplete Block Designs

We could simply randomize subsets of treatments to different blocks. This would work well if we have enough blocks. However, if we only have a small number of blocks, there would be the risk that we end up with a disconnected design.

On the other hand, we can also try to fulfill some optimality criterion. One candidate would be the criterion that we can estimate all treatment differences with the same precision. For example, this would mean that all confidence intervals for \(\alpha_i - \alpha_j\) have the same width (for any pair \(i,j\), where \(\alpha_i\) is the \(i\)th treatment effect). The design which fulfills this criterion is the so-called balanced incomplete block design.

A balanced incomplete block design (BIBD) is an incomplete block design where all pairs of treatments occur together in the same block equally often; we denote this number by \(\lambda\).

Using an example where blocks are given by raters and treatments by different chocolate chip cookie brands, we introduce the following notation, similar to Oehlert (2000):

  • \(g\): number of treatments (“number of different chocolate chip cookie brands”)
  • \(b\): number of blocks (“number of raters”)
  • \(k\): block size; number of experimental units per block (\(k < g\)) (“number of cookie brands each rater gets to taste”)
  • \(r\): number of replicates per treatment (“how often do we observe a certain cookie brand across all raters?”)
  • \(N\): total number of experimental units

By definition, we have \(N = b \cdot k = g \cdot r\). How can we construct a BIBD? For every setting \(k < g\) we can find a BIBD by taking all possible subsets, where we have \(g \choose k\) (binomial coefficient, implemented in the function choose in R). The corresponding design is called an unreduced balanced incomplete block design. For example, if we have \(g = 6\) treatments and \(k = 3\) experimental units per block, we get \({6 \choose 3} = 20\) blocks. In R, we can easily get this with the function combn.

combn(x = 6, m = 3)
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## [1,]    1    1    1    1    1    1    1    1    1     1     2     2     2
## [2,]    2    2    2    2    3    3    3    4    4     5     3     3     3
## [3,]    3    4    5    6    4    5    6    5    6     6     4     5     6
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20]
## [1,]     2     2     2     3     3     3     4
## [2,]     4     4     5     4     4     5     5
## [3,]     5     6     6     5     6     6     6

In the matrix above, every column defines a block. For example, block 10 would get treatments 1, 5 and 6. Typically, we would randomize the order or the placement of the treatments within a block, as we should always do.

In practice, we cannot always afford to do an unreduced BIBD, as the required number of blocks might be too large, for example, if we do not have enough raters. Whether a BIBD exists for a certain desired setting of number of blocks \(b\), block size \(k\) and number of treatments \(g\), is a combinatorial problem. A necessary, but not sufficient, condition for a BIBD to exist is \[\begin{equation} \frac{r \cdot (k - 1)}{g - 1} = \lambda, \tag{8.1} \end{equation}\] where \(\lambda\) is the number of times two treatments occur together in the same block (hence, an integer).

What is the intuition behind Equation (8.1)? A treatment appears in a total of \(r\) different blocks. In each of these blocks, there are \(k - 1\) available other experimental units, leading to a total of \(r \cdot (k - 1)\) available experimental units. There are \(g - 1\) other available treatments that we must divide among them. Hence, it must hold that \(r \cdot (k - 1) = \lambda \cdot (g - 1)\); otherwise, it is impossible to construct a BIBD. As already mentioned above, this condition is necessary, but not sufficient. This means that even if the condition is fulfilled, it might be impossible to construct a BIBD.

In R, package ibd (Mandal 2019) provides some functionality to find (B)IBDs. For example, function bibd searches for a balanced incomplete block design using sophisticated numerical optimization techniques.

library(ibd)
des.bibd <- bibd(v = 6, b = 10, r = 5, k = 3, lambda = 2) 
## arguments of function bibd are:
## - v: number of treatments
## - b: number of blocks
## - r: number of replicates (across all blocks)
## - k: number of experimental units per block
## - lambda: lambda

des.bibd$design ## here, blocks are given by *rows*
##          [,1] [,2] [,3]
## Block-1     2    3    6
## Block-2     2    5    6
## Block-3     4    5    6
## Block-4     3    4    5
## Block-5     1    4    6
## Block-6     1    3    6
## Block-7     1    3    5
## Block-8     1    2    5
## Block-9     2    3    4
## Block-10    1    2    4

In des.bibd$design, each row corresponds to a block. For a setting where no BIBD is possible, we can try to find a design that is nearly balanced using the function ibd (function bibd would return "parameters do not satisfy necessary conditions" in such a situation).

des.ibd <- ibd(v = 6, b = 9, k = 3)
## - v: number of treatments
## - b: number of blocks
## - k: number of units per block
des.ibd$design
##         [,1] [,2] [,3]
## Block-1    1    5    6
## Block-2    1    4    5
## Block-3    3    5    6
## Block-4    3    4    6
## Block-5    1    3    4
## Block-6    1    2    6
## Block-7    2    4    6
## Block-8    2    4    5
## Block-9    2    3    5

We can find the so-called concurrence matrix of the generated design in des.ibd$conc.mat. It contains information about how many times any pair of treatments appears together in the same block.

des.ibd$conc.mat ## check for (un)balancedness
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    4    1    1    2    2    2
## [2,]    1    4    1    2    2    2
## [3,]    1    1    4    2    2    2
## [4,]    2    2    2    5    2    2
## [5,]    2    2    2    2    5    2
## [6,]    2    2    2    2    2    5

On the diagonal, we can read off how often every treatment appears across all blocks. In the off-diagonal elements, we have the corresponding information about treatment pairs. For example, treatment 1 and 4 appear together twice in the same block. On the other hand, treatment 1 and 3 appear together only once.

For a given design, we would then randomize the different treatment subsets, i.e., the different rows in des.ibd$design, to the actual blocks, e.g., raters. Within each block, we would randomize the corresponding treatments to the experimental units, e.g., time-slots, and last but not least we would globally randomize the treatment numbers to the actual treatments, e.g., cookie brands.

8.3 Analysis of Incomplete Block Designs

8.3.1 Example: Taste Data

The analysis of an incomplete block design is “as usual.” We use a fixed block factor and a treatment factor leading to \[\begin{equation} Y_{ij} = \mu + \alpha_i + \beta_j + \epsilon_{ij}, \tag{8.2} \end{equation}\] where the \(\alpha_i\)’s are the treatment effects and the \(\beta_j\)’s are the block effects with the usual side constraints. Because we do not observe all the block and treatment combinations equally often (some are simply missing), we are faced with an unbalanced design. We typically use sum of squares for treatment effects that are adjusted for block effects.

Let us consider the data set taste from package daewr (John Lawson and Krennrich 2021) as an example. Twelve different panelists each rated two out of four different recipes. We first load the data and try to get an overview.

data("taste", package = "daewr")
str(taste)
## 'data.frame':    24 obs. of  3 variables:
##  $ panelist: Factor w/ 12 levels "1","2","3","4",..: 1 2 3 4 5 6 ...
##  $ recipe  : Factor w/ 4 levels "A","B","C","D": 1 1 1 2 2 3 ...
##  $ score   : num  5 7 5 6 6 8 ...
(tab <- xtabs(~ recipe + panelist, data = taste))
##       panelist
## recipe 1 2 3 4 5 6 7 8 9 10 11 12
##      A 1 1 1 0 0 0 1 1 1  0  0  0
##      B 1 0 0 1 1 0 1 0 0  1  1  0
##      C 0 1 0 1 0 1 0 1 0  1  0  1
##      D 0 0 1 0 1 1 0 0 1  0  1  1

The output of xtabs tells us that we have an incomplete design here. It is actually a balanced incomplete block design. We can test this with the function isGYD in package crossdes (Sailer 2013). For that reason, we first have to transform the design information to the desired form such that we have a matrix with 12 rows (panelists) and two columns containing the two treatments. This can be done with the following rather technical code where we extract the corresponding treatments column-wise.

library(crossdes)
(d <- t(apply(tab, 2, function(x) (1:4)[x != 0])))
##         
## panelist [,1] [,2]
##       1     1    2
##       2     1    3
##       3     1    4
##       4     2    3
##       5     2    4
##       6     3    4
##       7     1    2
##       8     1    3
##       9     1    4
##       10    2    3
##       11    2    4
##       12    3    4
## 
## [1] The design is a balanced incomplete block design w.r.t. rows.

We use the model in Equation (8.2). As it is an unbalanced data set, we use drop1 such that we get the sum of squares of recipe adjusted for panelist. We could also use summary here because recipe appears last in the model formula.

fit.taste <- aov(score ~ panelist + recipe, data = taste)
drop1(fit.taste, test = "F")
## Single term deletions
## 
## Model:
## score ~ panelist + recipe
##          Df Sum of Sq   RSS   AIC F value Pr(>F)
## <none>                 6.87  0.00               
## panelist 11      7.46 14.33 -4.37    0.89  0.581
## recipe    3      9.13 16.00 14.27    3.98  0.046

The \(F\)-test of recipe is significant. To illustrate, we have a look at all pairwise comparisons without any multiple testing correction.

library(multcomp)
contr <- glht(fit.taste, linfct = mcp(recipe = "Tukey"))
summary(contr, test = adjusted("none")) 
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## ...
## Linear Hypotheses:
##            Estimate Std. Error t value Pr(>|t|)
## B - A == 0    0.750      0.618    1.21    0.256
## C - A == 0    1.375      0.618    2.22    0.053
## D - A == 0   -0.625      0.618   -1.01    0.338
## C - B == 0    0.625      0.618    1.01    0.338
## D - B == 0   -1.375      0.618   -2.22    0.053
## D - C == 0   -2.000      0.618   -3.24    0.010
## (Adjusted p values reported -- none method)

We are not inspecting the p-values here, as it is a multiple testing problem. However, what we can observe is the fact that all differences share the same standard error (remember the motivation for a BIBD?). This is a feature of this special design!

8.3.2 Intra- and Inter-block Analysis

Up until now, we estimated treatment effects by adjusting for block effects. This means that whatever is special to a block is fully allocated to the block effect and does not affect the treatment effect. Basically, the estimate of the treatment effect is based on the “leftovers.” This is also known as an intra-block analysis.

On the other hand, if we treat the block factor as a random effect, the mean of the values of a block implicitly also contain information about the treatment effects. An analysis which is based on this information is known as an inter-block analysis. This leads to another estimate of the treatment effects. Both approaches can be combined. In practice, this can be achieved by using lmer and setting the block factor as a random effect. Typically, there is only little benefit; see also the discussion in section 14.1.2 of Oehlert (2000).

However, there are also situations where the effect can be dramatic. If we consider the introductory example with the disconnected design, some magic happens:

library(lmerTest)
fit.ibd2 <- lmer(y ~ treat + (1 | block), data = dat)
summary(glht(fit.ibd2, linfct = mcp(treat = c(1, 0, 0, 0, 0, -1))))
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: User-defined Contrasts
## ...
## Linear Hypotheses:
##        Estimate Std. Error z value Pr(>|z|)
## 1 == 0     2.39       1.77    1.35     0.18
## (Adjusted p values reported -- single-step method)

Here it is: an estimate of the difference between \(A\) and \(F\,\)! However, this is standing on very thin ice and you should not rely on this trick in practice!

8.4 Outlook

A balanced incomplete block design is of course only a very special case of an incomplete block design. A more general version is a so-called partially balanced incomplete block design (PBID) where some treatment pairs occur together more often than other pairs (defined through so-called associate classes). Hence, not all treatment differences are estimated with the same precision. This is useful if more precision is needed for comparisons with a control treatment than for comparisons between the other treatments. See for example Kuehl (2000) and Oehlert (2000) for more details.

With two block factors we can also create block designs which are incomplete in one or both “directions,” so-called row-column designs, see also Section 5.4. An example would be a design where the rows form a complete block design but the columns an incomplete block design. Again, more details can be found in Kuehl (2000) and Oehlert (2000).

8.5 Concluding Remarks

We have now learned about some of the more important experimental designs and the corresponding statistical analyses. The last few chapters were mostly in the form of short introductions (as it says in the book title) and more information can be found in the references mentioned in the corresponding chapters. A good place to start looking for more information about R packages are the CRAN task views which provide a nice overview of packages that are suitable for special topics. There is a task view “Design of Experiments (DoE) & Analysis of Experimental Data” which contains some of the packages that we already used in this book, but of course also many more.