---
title: "Introduction to forest plots"
author: "Max Gordon"
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    toc: true
    keep_md: true
vignette: >
  %\VignetteIndexEntry{Introduction to forest plots}
  %\usepackage[utf8]{inputenc}
  %\VignetteEngine{knitr::rmarkdown}
editor_options:
  chunk_output_type: console
---

```{r, echo=FALSE}
knitr::opts_chunk$set(fig.width = 7,
                      fig.height = 3)
```

[Forest plots](https://en.wikipedia.org/wiki/Forest_plot) date back to [1970s](https://doi.org/10.1136%2Fbmj.322.7300.1479) and are most frequently seen in [meta-analysis](https://en.wikipedia.org/wiki/Meta-analysis), but are in no way restricted to these. The forestplot package facilitates the creation of forest plots in R. It originated from the ['rmeta'](https://cran.r-project.org/package=rmeta)-package's `forestplot` function and, beyond generating standard forest plots, includes several additional features:

* **Text:**
    + Ability to use a table of text, i.e. the text can consist of several columns if needed&Dagger;
    + Supports expressions for mathematical symbols, e.g., expression(beta).
    + Set the gpar arguments (`fontfamily`, `fontface`, `cex`, etc)  for both summary and regular rows. This can be specified down to the each cell.
* **Confidence intervals:**
    + Automatically clip the confidence intervals, adding arrows when they exceed specified limits&Dagger;
    + Multiple confidence bands for the same row
    + Choose between different estimate markers such as boxes, diamonds, points
    + Custom confidence interval drawing functions
* **Legends:**
    + Have a legend on top or to the left of the plot
    + Have the legend within the plot's graph section
    + Put a box around legend (sharp or rounded corners)
* **Other:**
    + Dividing the graph visually by adding horizontal lines
    + Choose line height to either adapt to viewport (graph) size or specify an exact height in `unit`s
    + Choose between a zero-effect line line or an area box
    + Use flexible arguments, you can choose if you want to provide mean, lower, and upper separately or within one array.

&Dagger; Features present in the original `rmeta::forestplot` function.

**Note:** A key difference from the original `forestplot` is that the current function interprets xlog as indicating a logarithmic x-axis. Thus, data should be provided in the *antilog (exp)* format.

Text
====

Forest plots are closely tied to text, and the ability to customize this text is crucial.

```{r}
library(forestplot)
library(dplyr)
```


Table of text
-------------

Below is a basic example from the original `forestplot` function that shows how to use a table of text:

```{r, fig.height=4, fig.width=8, message=FALSE}
# Cochrane data from the 'rmeta'-package
base_data <- tibble::tibble(mean  = c(0.578, 0.165, 0.246, 0.700, 0.348, 0.139, 1.017),
                            lower = c(0.372, 0.018, 0.072, 0.333, 0.083, 0.016, 0.365),
                            upper = c(0.898, 1.517, 0.833, 1.474, 1.455, 1.209, 2.831),
                            study = c("Auckland", "Block", "Doran", "Gamsu",
                                      "Morrison", "Papageorgiou", "Tauesch"),
                            deaths_steroid = c("36", "1", "4", "14", "3", "1", "8"),
                            deaths_placebo = c("60", "5", "11", "20", "7", "7", "10"),
                            OR = c("0.58", "0.16", "0.25", "0.70", "0.35", "0.14", "1.02"))

base_data |>
  forestplot(labeltext = c(study, deaths_steroid, deaths_placebo, OR),
             clip = c(0.1, 2.5),
             xlog = TRUE) |>
  fp_set_style(box = "royalblue",
               line = "darkblue",
               summary = "royalblue") |>
  fp_add_header(study = c("", "Study"),
                deaths_steroid = c("Deaths", "(steroid)"),
                deaths_placebo = c("Deaths", "(placebo)"),
                OR = c("", "OR")) |>
  fp_append_row(mean  = 0.531,
                lower = 0.386,
                upper = 0.731,
                study = "Summary",
                OR = "0.53",
                is.summary = TRUE) |>
  fp_set_zebra_style("#EFEFEF")
```

Summary lines
-------------

The same as above but with lines based on the summary elements and also using a direct call with matrix input instead of relying on dplyr.

```{r, fig.height=4, fig.width=8, message=FALSE}
base_data |>
  forestplot(labeltext = c(study, deaths_steroid, deaths_placebo, OR),
             clip = c(0.1, 2.5),
             xlog = TRUE) |>
  fp_add_lines() |>
  fp_set_style(box = "royalblue",
               line = "darkblue",
               summary = "royalblue",
               align = "lrrr",
               hrz_lines = "#999999") |>
  fp_add_header(study = c("", "Study"),
                deaths_steroid = c("Deaths", "(steroid)") |>
                  fp_align_center(),
                deaths_placebo = c("Deaths", "(placebo)") |>
                  fp_align_center(),
                OR = c("", fp_align_center("OR"))) |>
  fp_append_row(mean  = 0.531,
                lower = 0.386,
                upper = 0.731,
                study = "Summary",
                OR = "0.53",
                is.summary = TRUE)
```

You can specify which lines to modify by providing a list with the line number as the name. In the example below, the 3rd and 11th lines are modified, counting from the first line above the first row (note that there is an empty row before the summary).:

```{r, fig.height=4, fig.width=8, message=FALSE}
base_data |>
  forestplot(labeltext = c(study, deaths_steroid, deaths_placebo, OR),
             clip = c(0.1, 2.5),
             xlog = TRUE) |>
  fp_add_lines(h_3 = gpar(lty = 2),
               h_11 = gpar(lwd = 1, columns = 1:4, col = "#000044")) |>
  fp_set_style(box = "royalblue",
               line = "darkblue",
               summary = "royalblue",
               align = "lrrr",
               hrz_lines = "#999999") |>
  fp_add_header(study = c("", "Study"),
                deaths_steroid = c("Deaths", "(steroid)") |>
                  fp_align_center(),
                deaths_placebo = c("Deaths", "(placebo)") |>
                  fp_align_center(),
                OR = c("", fp_align_center("OR"))) |>
  fp_append_row(mean  = 0.531,
                lower = 0.386,
                upper = 0.731,
                study = "Summary",
                OR = "0.53",
                is.summary = TRUE)
```

Adding vertices to the whiskers
-------------------------------

For marking the start/end points it is common to add a vertical line at the end of each whisker. In forestplot you simply specify the `vertices` argument:

```{r, fig.height=4, fig.width=8, message=FALSE}
base_data |>
  forestplot(labeltext = c(study, deaths_steroid, deaths_placebo, OR),
             clip = c(0.1, 2.5),
             vertices = TRUE,
             xlog = TRUE) |>
  fp_add_lines(h_3 = gpar(lty = 2),
               h_11 = gpar(lwd = 1, columns = 1:4, col = "#000044")) |>
  fp_set_style(box = "royalblue",
               line = "darkblue",
               summary = "royalblue",
               align = "lrrr",
               hrz_lines = "#999999")  |>
  fp_add_header(study = c("", "Study"),
                deaths_steroid = c("Deaths", "(steroid)") |>
                  fp_align_center(),
                deaths_placebo = c("Deaths", "(placebo)") |>
                  fp_align_center(),
                OR = c("", fp_align_center("OR"))) |>
  fp_append_row(mean  = 0.531,
                lower = 0.386,
                upper = 0.731,
                study = "Summary",
                OR = "0.53",
                is.summary = TRUE)
```

Positioning the graph element
-----------------------------

You can also choose to have the graph positioned within the text table by specifying the `graph.pos` argument using the fp_:

```{r}
base_data |>
  forestplot(labeltext = c(study, deaths_steroid, deaths_placebo, OR),
             clip = c(0.1, 2.5),
             vertices = TRUE,
             xlog = TRUE) |>
  fp_add_lines(h_3 = gpar(lty = 2),
               h_11 = gpar(lwd = 1, columns = 1:4, col = "#000044")) |>
  fp_set_style(box = "royalblue",
               line = "darkblue",
               summary = "royalblue",
               hrz_lines = "#999999") |>
  fp_add_header(study = c("", "Study"),
                deaths_steroid = c("Deaths", "(steroid)"),
                deaths_placebo = c("Deaths", "(placebo)"),
                OR = c("", "OR")) |>
  fp_append_row(mean  = 0.531,
                lower = 0.386,
                upper = 0.731,
                study = "Summary",
                OR = "0.53",
                is.summary = TRUE) |>
  fp_decorate_graph(box = gpar(lty = 2, col = "lightgray"),
                    graph.pos = 4) |>
  fp_set_zebra_style("#f9f9f9")
```

Using expressions
-----------------

When presenting regression output, it can be useful to include non-ASCII characters. We will use my study comparing health related quality of life 1 year after total hip arthroplasties between Sweden and Denmark for this section:

```{r}
data(dfHRQoL)

dfHRQoL |>
  filter(group == "Sweden") |>
  mutate(est = sprintf("%.2f", mean), .after = labeltext) |>
  forestplot(labeltext = c(labeltext, est),
             xlab = "EQ-5D index") |>
  fp_add_header(est = expression(beta)) |>
  fp_set_style(box = "royalblue",
               line = "darkblue",
               summary = "royalblue")
```

Altering fonts
--------------

Altering fonts may give a completely different feel to the table:

```{r}
# You can set directly the font to desired value, the next three lines are just for handling MacOs on CRAN
font <- "mono"
if (grepl("Ubuntu", Sys.info()["version"])) {
  font <- "HersheyGothicEnglish"
}
dfHRQoL |>
  filter(group == "Sweden") |>
  mutate(est = sprintf("%.2f", mean), .after = labeltext) |>
  forestplot(labeltext = c(labeltext, est),
             xlab = "EQ-5D index") |>
  fp_add_header(est = "Est.") |>
  fp_set_style(box = "royalblue",
               line = "darkblue",
               summary = "royalblue",
               txt_gp = fpTxtGp(label = gpar(fontfamily = font)))
```

There is also the possibility of being selective in gp-styles:

```{r}
dfHRQoL |>
  filter(group == "Sweden") |>
  mutate(est = sprintf("%.2f", mean), .after = labeltext) |>
  forestplot(labeltext = c(labeltext, est),
             xlab = "EQ-5D index") |>
  fp_add_header(est = "Est.") |>
  fp_set_style(box = "royalblue",
               line = "darkblue",
               summary = "royalblue",
               txt_gp = fpTxtGp(label = list(gpar(fontfamily = font),
                                             gpar(fontfamily = "",
                                                  col = "#660000")),
                                ticks = gpar(fontfamily = "", cex = 1),
                                xlab  = gpar(fontfamily = font, cex = 1.5)))
```

Confidence intervals
====================

Clipping the interval is convenient for uncertain estimates in order to retain the resolution for those of more interest. The clipping simply adds an arrow to the confidence interval, see the bottom estimate below:

```{r}
dfHRQoL |>
  filter(group == "Sweden") |>
  mutate(est = sprintf("%.2f", mean), .after = labeltext) |>
  forestplot(labeltext = c(labeltext, est),
             clip = c(-.1, Inf),
             xlab = "EQ-5D index") |>
  fp_set_style(box = "royalblue",
               line = "darkblue",
               summary = "royalblue")
```

Custom box size
---------------

You can force the box size to a certain size through the `boxsize` argument.

```{r}
dfHRQoL |>
  filter(group == "Sweden") |>
  mutate(est = sprintf("%.2f", mean), .after = labeltext) |>
  forestplot(labeltext = c(labeltext, est),
             boxsize = 0.2,
             clip = c(-.1, Inf),
             xlab = "EQ-5D index") |>
  fp_set_style(box = "royalblue",
               line = "darkblue",
               summary = "royalblue")
```

If you want to keep the relative sizes you need to provide a wrapper to the draw function that transforms the boxes. Below shows how this is done, also how you combine multiple forestplots into one image:

```{r fig.width=10, fig.height=4}
fp_sweden <- dfHRQoL |>
  filter(group == "Sweden") |>
  mutate(est = sprintf("%.2f", mean), .after = labeltext) |>
  forestplot(labeltext = c(labeltext, est),
             title = "Sweden",
             clip = c(-.1, Inf),
             xlab = "EQ-5D index",
             new_page = FALSE)

fp_denmark <- dfHRQoL |>
  filter(group == "Denmark") |>
  mutate(est = sprintf("%.2f", mean), .after = labeltext) |>
  forestplot(labeltext = c(labeltext, est),
             title = "Denmark",
             clip = c(-.1, Inf),
             xlab = "EQ-5D index",
             new_page = FALSE)

library(grid)
grid.newpage()
borderWidth <- unit(4, "pt")
width <- unit(convertX(unit(1, "npc") - borderWidth, unitTo = "npc", valueOnly = TRUE)/2, "npc")
pushViewport(viewport(layout = grid.layout(nrow = 1,
                                           ncol = 3,
                                           widths = unit.c(width,
                                                           borderWidth,
                                                           width))
                      )
             )
pushViewport(viewport(layout.pos.row = 1,
                      layout.pos.col = 1))
fp_sweden |>
  fp_set_style(box = "royalblue",
               line = "darkblue",
               summary = "royalblue")
upViewport()
pushViewport(viewport(layout.pos.row = 1,
                      layout.pos.col = 2))
grid.rect(gp = gpar(fill = "#dddddd", col = "#eeeeee"))
upViewport()
pushViewport(viewport(layout.pos.row = 1,
                      layout.pos.col = 3))
fp_denmark |>
  fp_set_style(box = "royalblue",
               line = "darkblue",
               summary = "royalblue")
upViewport(2)
```


Multiple confidence bands
-------------------------

When combining similar outcomes for the same exposure I've found it useful to use multiple bands per row. This efficiently increases the data-ink ratio while making the comparison between the two bands trivial. The first time I've used this was in [my paper](https://doi.org/10.1186/1471-2474-14-316) comparing Swedish with Danish patients 1 year after total hip arthroplasty. Here the clipping also becomes obvious as the Danish sample was much smaller, resulting in wider confidence intervals. In version 2.0+, which supports dplyr, groups can be merged into a single table.

```{r}
dfHRQoL |>
  group_by(group) |>
  forestplot(clip = c(-.1, 0.075),
             ci.vertices = TRUE,
             ci.vertices.height = 0.05,
             boxsize = .1,
             lineheight = "lines",
             xlab = "EQ-5D index") |>
  fp_add_lines("steelblue") |>
  fp_add_header("Variable") |>
  fp_set_style(box = c("blue", "darkred") |> lapply(function(x) gpar(fill = x, col = "#555555")),
               default = gpar(vertices = TRUE))
```

Estimate indicator
------------------

You can choose between a number of different estimate indicators. Using the example above we can set the Danish results to circles.

```{r}
dfHRQoL |>
  group_by(group) |>
  forestplot(fn.ci_norm = c(fpDrawNormalCI, fpDrawCircleCI),
             boxsize = .25, # We set the box size to better visualize the type
             line.margin = .1, # We need to add this to avoid crowding
             clip = c(-.125, 0.075),
             xlab = "EQ-5D index") |>
  fp_set_style(box = c("blue", "darkred") |> lapply(function(x) gpar(fill = x, col = "#555555")),
               default = gpar(vertices = TRUE)) |>
  fp_set_zebra_style("#F5F9F9")
```

The confidence interval/box drawing functions are fully customizeable. You can write your own function that accepts the parameters: lower_limit, estimate, upper_limit, size, y.offset, clr.line, clr.marker, and lwd.

Choosing line type
------------------

You can also choose from all available line types through the *lty.ci* that can also be specified element specific.

```{r}
dfHRQoL |>
  group_by(group) |>
  forestplot(fn.ci_norm = c(fpDrawNormalCI, fpDrawCircleCI),
             boxsize = .25, # We set the box size to better visualize the type
             line.margin = .1, # We need to add this to avoid crowding
             clip = c(-.125, 0.075),
             lty.ci = c(1, 2),
             xlab = "EQ-5D index") |>
  fp_set_style(box = c("blue", "darkred") |> lapply(function(x) gpar(fill = x, col = "#555555")),
               default = gpar(vertices = TRUE))
```


Legends
=======

Legends are automatically added when using `group_by` but we can also control them directly through the `legend` argument:

```{r}
dfHRQoL |>
  group_by(group) |>
  forestplot(legend = c("Swedes", "Danes"),
             fn.ci_norm = c(fpDrawNormalCI, fpDrawCircleCI),
             boxsize = .25, # We set the box size to better visualize the type
             line.margin = .1, # We need to add this to avoid crowding
             clip = c(-.125, 0.075),
             xlab = "EQ-5D index") |>
  fp_set_style(box = c("blue", "darkred") |> lapply(function(x) gpar(fill = x, col = "#555555")))
```

This can be further customized by setting the `legend_args` argument using the `fpLegend` function:

```{r}
dfHRQoL |>
  group_by(group) |>
  forestplot(legend = c("Swedes", "Danes"),
             legend_args = fpLegend(pos = list(x = .85, y = 0.25),
                                    gp = gpar(col = "#CCCCCC", fill = "#F9F9F9")),
             fn.ci_norm = c(fpDrawNormalCI, fpDrawCircleCI),
             boxsize = .25, # We set the box size to better visualize the type
             line.margin = .1, # We need to add this to avoid crowding
             clip = c(-.125, 0.075),
             xlab = "EQ-5D index") |>
  fp_set_style(box = c("blue", "darkred") |> lapply(function(x) gpar(fill = x, col = "#555555")))
```


Ticks and grids
===============

If the automated ticks don't match the desired ones, it is easy to change them using the `xticks` argument:

```{r}
dfHRQoL |>
  group_by(group) |>
  forestplot(fn.ci_norm = c(fpDrawNormalCI, fpDrawCircleCI),
             boxsize = .25, # We set the box size to better visualize the type
             line.margin = .1, # We need to add this to avoid crowding
             clip = c(-.125, 0.075),
             xticks = c(-.1, -0.05, 0, .05),
             xlab = "EQ-5D index") |>
  fp_set_style(box = c("blue", "darkred") |> lapply(function(x) gpar(fill = x, col = "#555555")))
```

By adding a "labels" attribute to the ticks you can tailor the ticks even further, here's an example the suppresses tick text for every other tick:

```{r}
xticks <- seq(from = -.1, to = .05, by = 0.025)
xtlab <- rep(c(TRUE, FALSE), length.out = length(xticks))
attr(xticks, "labels") <- xtlab

dfHRQoL |>
  group_by(group) |>
  forestplot(fn.ci_norm = c(fpDrawNormalCI, fpDrawCircleCI),
             boxsize = .25, # We set the box size to better visualize the type
             line.margin = .1, # We need to add this to avoid crowding
             clip = c(-.125, 0.075),
             xticks = xticks,
             xlab = "EQ-5D index") |>
  fp_set_style(box = c("blue", "darkred") |> lapply(function(x) gpar(fill = x, col = "#555555")))
```

Sometimes you have a very tall graph and you want to add helper lines in order
to make it easier to see the tick marks. This can be useful in non-inferiority
or equivalence studies. You can do this through the `grid` argument:

```{r}
dfHRQoL |>
  group_by(group) |>
  forestplot(fn.ci_norm = c(fpDrawNormalCI, fpDrawCircleCI),
             boxsize = .25, # We set the box size to better visualize the type
             line.margin = .1, # We need to add this to avoid crowding
             clip = c(-.125, 0.075),
             xticks = c(-.1, -0.05, 0, .05),
             zero = 0,
             xlab = "EQ-5D index") |>
  fp_set_style(box = c("blue", "darkred") |> lapply(function(x) gpar(fill = x, col = "#555555"))) |>
  fp_decorate_graph(grid = structure(c(-.1, -.05, .05),
                              gp = gpar(lty = 2, col = "#CCCCFF")))
```

You can easily customize both what grid lines to use and what type they should be by adding the gpar object to a vector:

```{r}
dfHRQoL |>
  group_by(group) |>
  forestplot(fn.ci_norm = c(fpDrawNormalCI, fpDrawCircleCI),
             boxsize = .25, # We set the box size to better visualize the type
             line.margin = .1, # We need to add this to avoid crowding
             clip = c(-.125, 0.075),
             xlab = "EQ-5D index") |>
  fp_set_style(box = c("blue", "darkred") |> lapply(function(x) gpar(fill = x, col = "#555555"))) |>
  fp_decorate_graph(grid = structure(c(-.1, -.05, .05),
                                     gp = gpar(lty = 2, col = "#CCCCFF")))
```

If you are unfamiliar with the structure call it is equivalent to generating a vector and then setting an attribute, eg:

```{r, eval=FALSE, echo=TRUE}
grid_arg <- c(-.1, -.05, .05)
attr(grid_arg, "gp") <- gpar(lty = 2, col = "#CCCCFF")

identical(grid_arg,
          structure(c(-.1, -.05, .05),
                    gp = gpar(lty = 2, col = "#CCCCFF")))
# Returns TRUE
```


That covers most of the key features. I hope you find the `forestplot` package useful.