Preface

This book should help you get familiar with analysis of variance (ANOVA) and mixed models in R (R Core Team 2021). From a methodological point of view, we build upon the knowledge of an introductory course to probability and statistics covering the basic concepts of statistical inference (estimation, hypothesis tests, confidence intervals) up to the two-sample \(t\)-test. See for example Dalgaard (2008) for an introduction of both theory and the corresponding functions in R. A more theoretical reference is Rice (2007).

There are of course already well-established excellent textbooks covering ANOVA including experimental design in great detail. Examples are Oehlert (2000), Kuehl (2000), Montgomery (2019) and many more. We build upon these great books. From a mathematical point of view, we use similar notation as Oehlert (2000). The goal of this book is to provide a compact overview of the most important topics including the corresponding applications in R using flexible mixed model approaches. We also use examples from the classical textbooks and will redo the corresponding statistical analyses in R.

As this is an introductory text, the focus is on getting to know multiple experimental design types, when they are being used and what a proper analysis in R looks like. This is why we will not do all the details, especially for the more advanced topics. The idea is that if the reader is familiar with the basic concepts and their applications in R, this knowledge can be extended (and applied) to other areas.

Besides discussing the theory and the corresponding R functions, we also try to give you an intuition in when and how things can go wrong and what aspects have to be considered in practice. This is not only useful when planning an experiment on your own, but also when analyzing data from other sources or when reading a research paper.

From a statistical point of view, an ANOVA model is nothing more than a special case of a linear regression model. Note that no prior knowledge of linear regression is needed for this book. For the basic models, we mostly use the function aov in R in order to get the “classical” outputs. In fact, aov simply calls lm (the linear regression model fitting function) and adjusts the output accordingly. We sometimes mention extensions to more general linear regression models. However, this book is not meant to be an introductory text to linear regression. See for example Fox and Weisberg (2019) or Faraway (2005) for applied introductions.

If not stated otherwise, we use a significance level of 5% if we make statements about statistical significance, or equivalently, a coverage level of 95% for the corresponding confidence intervals.

If you find any errors, inconsistencies or if you miss something, please e-mail me or fill out the anonymous feedback form at https://goo.gl/ZBvjj9.

The most recent version of this book and a list of errors can be found on https://stat.ethz.ch/~meier/teaching/book-anova/.

Structure of the Book

We begin with a non-technical introduction to the general principles of experimental design in Chapter 1. Chapter 2 then introduces the first models for designs with only one factor. More specific questions regarding these models are then discussed in Chapter 3, including the problem of multiple testing. Chapter 4 introduces factorial designs which arise if a treatment is a combination of multiple factors. A short introduction to complete block designs, which are a great way to increase power or precision, can be found in Chapter 5. Chapter 6 introduces a new class of models including random and fixed effects, the so-called mixed models which are very popular in many applied areas. Some more special designs follow: Chapter 7 introduces a new class of designs which can deal with experimental units of different sizes, the so-called split-plot designs. We conclude with Chapter 8 about block designs with small blocks that cannot accommodate all treatments, so-called incomplete block designs.

Software Information and Conventions

This book uses a lot of R code. If you are completely new to R, you can get more information for example at https://cran.r-project.org/manuals.html or https://education.rstudio.com/.

The R code and output has the following form:

text <- "Let's get started ..."
paste(text, "now!", sep = " ")

## [1] "Let's get started ... now!"

This means that output lines start with two comments sign “##”. For better readability, we sometimes shorten the R output a bit. If we remove multiple lines, this will be indicated with the symbol “## ...”, i.e., two comment signs and three dots, in the output.

Regarding plots, we mostly use base R graphics. For more complex plots we switch to ggplot2 (Wickham 2016).

We often load data directly from the web, either in tabular format using the function read.table, or already as an R object, using the function readRDS.

The packages knitr (Xie 2015) and bookdown (Xie 2021) were used to compile this book.

Acknowledgments

First, I’d like to thank all members of the Seminar für Statistik at ETH Zürich for such a nice and fun working and research environment and for making it possible to work on this project. I learned a lot a long time ago from a wise man nicknamed “Puma” while working in a building named “LEO”. Hans-Rudolf Roth, you are missed!

Many people contributed in various ways to this book, special thanks go to Peter Bühlmann, Markus Kalisch, Marloes Maathuis, Christoph Buck, Claude Renaux, Camilla Gerboth, Tanja Finger, Michael Zellinger, Reto Zihlmann and Bill Perry.

I also want to thank Rob Calver from Chapman & Hall/CRC Press for the support and patience.

Finally, and most importantly, I would like to thank my family for all the support.

Lukas Meier
Zürich, Switzerland

1 Learning from Data