---
title: "Layout Customization"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Layout Customization}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>"
)
```

The package offers a suite of `align_*` functions designed to give you precise
control over plot layout. These functions enable you to reorder the observations
or partition the observations into multiple groups.

Currently, there are four key `align_*` functions available for layout customization:

- **`align_group`**: Group and align plots based on categorical factors.
- **`align_order`**: Reorder layout observations based on statistical weights
 or allows for manual reordering based on user-defined criteria.
- **`align_kmeans`**: Arrange plots by k-means clustering results.
- **`align_dendro`**: Align plots according to hierarchical clustering or dendrograms.

```{r setup}
library(ggalign)
```

```{r setup_data}
set.seed(123)
small_mat <- matrix(rnorm(81), nrow = 9)
rownames(small_mat) <- paste0("row", seq_len(nrow(small_mat)))
colnames(small_mat) <- paste0("column", seq_len(ncol(small_mat)))
```

## `align_group`

The `align_group()` function allows you to group rows/columns into separate
panels. It doesn't add any plot area. 

```{r align_group_top}
ggheatmap(small_mat) +
    hmanno("t") +
    align_group(sample(letters[1:4], ncol(small_mat), replace = TRUE))
```

By default, the facet strip text is removed. You can override this behavior with
`theme(strip.text = element_text())`. Since `align_group()` does not create a
new plot, the panel title can only be added to the heatmap plot.
```{r align_group_left}
ggheatmap(small_mat) +
    theme(strip.text = element_text()) +
    hmanno("l") +
    align_group(sample(letters[1:4], nrow(small_mat), replace = TRUE))
```

## `align_order`
The `align_order()` function order the rows/columns based on the summary
weights, Like `align_group()`, it doesn't add a plot area. 

Here, we order the rows based on the means.
```{r align_order}
ggheatmap(small_mat) +
    hmanno("l") +
    align_order(rowMeans)
```

In addition, we can provide the ordering integer index directly in the `order`
argument.
```{r align_order_integer_index}
my_order <- sample(nrow(small_mat))
print(rownames(small_mat)[my_order])
ggheatmap(small_mat) +
    hmanno("l") +
    align_order(my_order)
```

We can also provide the ordering character index.
```{r align_order_character_index}
ggheatmap(small_mat) +
    hmanno("l") +
    align_order(rownames(small_mat)[my_order])
```

By default, `align_order()` reorders the rows or columns in ascending order of
the summary function's output (from bottom to top for rows, or from left to
right for columns). To reverse this order, you can set `reverse = TRUE`: 
```{r align_order_reverse}
ggheatmap(small_mat) +
    hmanno("l") +
    align_order(rowMeans, reverse = TRUE)
```

Some `align_*` functions accept a `data` argument. This can be a matrix, a data
frame, or even a simple vector, which will be converted into a one-column
matrix. If the `data` argument is `NULL`, the function will use the layout data,
as demonstrated in the previous example. The `data` argument can also accept a
function (purrr-like lambda syntax is supported), which will be applied to the
layout data. 

It is important to note that all `align_*` functions consider the `rows` as
the observations. It means the `NROW(data)` must return the same number with
the specific `layout` axis.

 - `heatmap_layout()`/`ggheatmap()`: for column annotation, the `layout` data
 will be transposed before using (If data is a `function`, it will be applied
 with the transposed matrix). This is necessary because column annotation uses
 heatmap columns as observations, but we need rows.

 - `stack_layout()`/`ggstack()`: the `layout` data will be used as it is since
 we place all plots along a single axis.

Even for top and bottom annotations, you can use `rowMeans()` to calculate the
mean value across all columns. 

```{r}
ggheatmap(small_mat) +
    hmanno("t") +
    align_order(rowMeans)
```

## `align_kmeans`
The `align_kmeans()` function groups heatmap rows or columns based on k-means
clustering. Like the previous functions, it does not add a plot area. 
```{r}
ggheatmap(small_mat) +
    hmanno("t") +
    align_kmeans(3L)
```

It's important to note that all `align_*` functions which define groups must not
break the previous established groups. This means the new groups must nest in
the old groups, in this way, usually they cannot be used if groups already
exist.

```{r error=TRUE}
ggheatmap(small_mat) +
    hmanno("t") +
    align_group(sample(letters[1:4], ncol(small_mat), replace = TRUE)) +
    align_kmeans(3L)
```

```{r error=TRUE}
ggheatmap(small_mat) +
    hmanno("t") +
    align_kmeans(3L) +
    align_group(sample(letters[1:4], ncol(small_mat), replace = TRUE))
```

## align_dendro
The `align_dendro()` function adds a dendrogram to the layout and can also
reorder or split the layout based on hierarchical clustering. This is
particularly useful for working with heatmap plots.

```{r align_dendro}
ggheatmap(small_mat) +
    hmanno("t") +
    align_dendro()
```

Hierarchical clustering is performed in two steps: calculate the distance matrix
and apply clustering. You can use the `distance` and `method` argument to
control the dendrogram builind process.

There are two ways to specify `distance` metric for clustering:

- specify `distance` as a pre-defined option. The valid values are the supported
methods in `dist()` function and coorelation coefficient `"pearson"`,
`"spearman"` and `"kendall"`. The correlation distance is defined as `1 - cor(x,
y, method = distance)`.
- a self-defined function which calculates distance from a matrix. The function
should only contain one argument. Please note for clustering on columns, the
matrix will be transposed automatically. 

```{r align_dendro_distance_pearson}
ggheatmap(small_mat) +
    hmanno("t") +
    align_dendro(distance = "pearson") +
    patch_titles(top = "pre-defined distance method (1 - pearson)")
```

```{r align_dendro_distance_function}
ggheatmap(small_mat) +
    hmanno("t") +
    align_dendro(distance = function(m) dist(m)) +
    patch_titles(top = "a function that calculates distance matrix")
```

Method to perform hierarchical clustering can be specified by `method`. Possible
methods are those supported in `hclust()` function. And you can also provide a
self-defined function, which accepts the distance object and return a `hclust`
object.

```{r}
ggheatmap(small_mat) +
    hmanno("t") +
    align_dendro(method = "ward.D2")
```

The dendrogram can also be used to cut the columns/rows into groups. You can
specify `k` or `h`, which work similarly to `cutree()`: 

```{r}
ggheatmap(small_mat) +
    hmanno("t") +
    align_dendro(k = 3L)
```

In contrast to `align_group()`, `align_kmeans()`, and `align_order()`,
`align_dendro()` is capable of drawing plot components. So it has a default
`set_context` value of `TRUE`, meaning it will set the active context of the
annotation stack layout. In this way, we can add any ggplot elements to this
plot area.

```{r}
ggheatmap(small_mat) +
    hmanno("t") +
    align_dendro() +
    geom_point(aes(y = y))
```

The `align_dendro()` function creates default `node` data for the ggplot. See
`ggplot2 specification` in `?align_dendro` for details. Additionally, `edge`
data is added to the `ggplote::geom_segment()` layer directly, used to draw the
dendrogram tree. One useful variable in both `node` and `edge` data is the
`branch` column, corresponding to the `cutree` result: 

```{r}
ggheatmap(small_mat) +
    hmanno("t") +
    align_dendro(aes(color = branch), k = 3) +
    geom_point(aes(color = branch, y = y))
```

You can reorder the dendrogram based on the mean values of the observations by
setting `reorder_dendrogram = TRUE`.
```{r fig.width=10}
h1 <- ggheatmap(small_mat) +
    hmanno("t") +
    align_dendro(aes(color = branch), k = 3, reorder_dendrogram = TRUE) +
    ggtitle("reorder_dendrogram = TRUE")
h2 <- ggheatmap(small_mat) +
    hmanno("t") +
    align_dendro(aes(color = branch), k = 3) +
    ggtitle("reorder_dendrogram = FALSE")
align_plots(h1, h2)
```

`align_dendro()` can also perform clustering between groups, meaning it can be
used even if there are existing groups present in the layout, in this way, you
cannot specify `k` or `h`: 

```{r}
set.seed(3L)
column_groups <- sample(letters[1:3], ncol(small_mat), replace = TRUE)
ggheatmap(small_mat) +
    hmanno("t") +
    align_group(column_groups) +
    align_dendro(aes(color = branch))
```

You can reorder the groups by setting `reorder_group = TRUE`.
```{r}
ggheatmap(small_mat) +
    hmanno("t") +
    align_group(column_groups) +
    align_dendro(aes(color = branch), reorder_group = TRUE)
```

You can merge the sub-tree in each group by settting `merge_dendrogram = TRUE`.
```{r}
ggheatmap(small_mat) +
    hmanno("t") +
    align_group(column_groups) +
    align_dendro(aes(color = branch), merge_dendrogram = TRUE)
```

You can reorder the dendrogram and merge simutaneously.
```{r}
ggheatmap(small_mat) +
    hmanno("t") +
    align_group(column_groups) +
    align_dendro(aes(color = branch),
        reorder_group = TRUE,
        merge_dendrogram = TRUE
    ) +
    hmanno("b") +
    align_dendro(aes(color = branch),
        reorder_group = FALSE,
        merge_dendrogram = TRUE
    )
```

If you specify `k` or `h`, this will always turn off sub-clustering. The same
principle applies to `align_dendro()`, where new groups must be nested within
the previously established groups.  
```{r error=TRUE}
ggheatmap(small_mat) +
    hmanno("t") +
    align_group(column_groups) +
    align_dendro(k = 2L)
```

## Session information
```{r}
sessionInfo()
```