---
title: "Methods"
bibliography: bibliography.bib
output:
  rmarkdown::html_vignette:
    toc: true
    number_sections: true
vignette: >
  %\VignetteIndexEntry{Methods}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

This vignette defines the models and historical borrowing metrics supported in the `historicalborrowlong` package.

# Models

## Definitions

Unless otherwise specified, Greek letters refer to parameters estimated by the model, and Roman letters refer to fixed hyperparameters and other constants set by the user in advance.

* "Study": a clinical trial. Could be a borrowed clinical trial from the past or a current clinical trial under analysis.
* "Group": a study arm, such as a treatment group or placebo group.
* "Rep": a repeated measure in the context of repeated measures / longitudinal modeling. Could be a time point such as a study visit.
* $k$: study index.
* $K$: index of the current study.
* $y_k$: vector of patient-by-rep clinical responses to a continuous outcome variable for study $k$.
* $(X)_{k}$: the row of matrix $X$ corresponding to study $k$.
* $\alpha$: Vector of control group mean parameters, one for each rep for the pooled model, one for each study and rep in the hierarchical and independent models. The first elements are for the historical studies, and the last one is for the current study.
* $\delta$: Vector of study-specific treatment mean parameters. There is one for each combination of study, non-control treatment group, and rep. An optional constraint may be added to pool all arms at baseline within each study, which reduces the number of elements of $\delta$.
* $d$: integer index for the elements of $\delta$.
* $b$: integer index for the elements of $\beta$.
* $t$: index of a rep (e.g. time point of a repeated measure such as a patient visit in a clinical trial.)
* $\beta$: Vector of study-specific baseline covariate parameters.
* $X_\alpha$: model matrix for the control group mean parameters $\alpha$. It has indicator columns to select the appropriate element of $\alpha$ for each element of $y_k$.
* $X_\delta$: model matrix for the treatment mean parameters $\delta$. It has indicator columns to select the appropriate element of $\delta$ for each element of $y_k$.
* $X_\beta$: model matrix for the baseline covariate fixed effect parameters $\beta$. It has indicator columns to select the appropriate element of $\beta$ for each element of $y_k$.
* $\sigma_k$: Vector of rep-specific residual standard deviations for study $k$.
* $\Lambda_k$: lower-triangular Cholesky factor of the by-rep residual correlation matrix for study $k$.
* $N_k$: number of patients in study $k$.
* $\Sigma_k$: by-rep residual covariance matrix of study $k$.
* $T$: number of repeated measures per subject.
* $I_T$: identity matrix with rows and columns equal to the number of repeated measures per subject.
* $I(\cdot)$: indicator function
* $AR(1)(n, \rho)$: an AR(1) correlation matrix with $n$ rows and correlation parameter $\rho$.
* $m_k$ index to indicate the type of residual covariance of study $k$: 1 for unstructured / fully parameterized, 2 for AR(1), and 3 for diagonal.

## Baseline covariates

The baseline covariates model matrix $X_\beta$ adjusts for baseline covariates. It may contain a continuous column for baseline and binary indicator columns for the levels of user-defined covariates. All these columns are included if possible, but the method automatically drops baseline covariate columns to ensure that the combine model matrix $X_k^* = \left [ {X_\alpha}^* \quad {X_\delta}^* \quad {X_\beta}^* \right ]_k$ is full rank, where $X_k^*$ denotes the rows of matrix $X$ corresponding to study $k$, with additional rows dropped if the corresponding elements of $y$ are missing. The choice of columns to drop from ${X_\beta}_k^*$ is determined by the rank and pivoting strategy of the QR decomposition of $X_k$ using the Householder algorithm with pivoting (`base::qr()`, LINPACK routine DQRDC).

Separately within each study, each column of $X_\beta$ is centered to have mean 0, and if possible, scaled to have variance 1. Scaling ensures that the priors on parameters $\beta$ remain relatively diffuse relative to the input data. Study-level centering ensures that the $\alpha$ parameters truly act as *unconditional* study-specific control group means (as opposed to conditional on the subset of patients at the reference level of $X_\beta$), and it ensures that borrowing across $\alpha$ components fully presents as control group borrowing.

## Model matrices

Each primary model is parameterized thus:

$$
\begin{aligned}
E(y) = \left( X_\alpha \right)_k \alpha + \left ( X_\delta \right)_k \delta + \left ( X_\beta  \right)_k \beta
\end{aligned}
$$

Above, $\left (X_\alpha \right)_k$, $\left (X_\delta \right)_k$, and $\left (X_\beta \right)_k$ are fixed matrices for study $k$. $\left (X_\beta \right)_k$ is a conventional model matrix for the baseline covariates $\beta$, and the details are explained in the "Baseline covariates" section below. $\left (X_\alpha \right)_k$ is a matrix of zeroes and ones. It is constructed such that each scalar component of $\alpha$ is the mean response of the control group in a particular study at a given time point. Likewise, $\left (X_\delta \right)_k$ is a matrix of zeroes and ones such that each scalar component of $\delta$ is the mean response of a non-control treatment group in a particular study at a given time point.

To illustrate, let $y_{ijkt}$ be patient $i$ in treatment group $j$ (where $j = 1$ is the control group) of study $k$ at time point $t$, and let $\left ( X_\beta \beta \right )_{ijkt}$ be the corresponding scalar element of the vector $\left ( X_\beta \right ) \beta$. Then,

$$
\begin{aligned}
E(y_{ijkt}) = I (j = 1) \alpha_{kt} + I (j > 1) \delta_{jkt} + \left ( X_\beta \beta \right )_{ijkt}
\end{aligned}
$$

In addition, if the constraint in the parameterization is activated (i.e. `hbl_mcmc_hierarchical(constraint = TRUE)`) then the control and treatment patients are pooled at time point $t = 1$ within each study $k$:

$$
\begin{aligned}
E(y_{ijk1}) = \alpha_{k1} + \left ( X_\beta \beta \right )_{ijk1}
\end{aligned}
$$

This parameterization is represented in the more compact expression $\left( X_\alpha \right)_k \alpha + \left ( X_\delta \right)_k \delta + \left ( X_\beta  \right)_k \beta$ in the model definitions in this vignette.

## Post-processing

The `hbl_summary()` function post-processes the results from the model. It accepts MCMC samples of parameters and returns estimated marginal means of the response and treatment effect. To estimate marginal means of the response, `hbl_summary()` takes group-level averages of posterior samples of fitted values while dropping covariate adjustment terms from the model (i.e. $X_\alpha \alpha + X_\delta \delta$). Because the columns of $X_\beta$ are centered at their means, this choice is mathematically equivalent to `emmeans::emmeans()` with the `weights = "proportional"` (@lenth2016).

## Hierarchical model

Functions:

* `hbl_sim_hierarchical()`
* `hbl_mcmc_hierarchical()`

The hierarchical model analyzes the data from all studies and shrinks the control study-by-rep means $\alpha_{kt}$ (one scalar parameter for each unique combination of study and rep) towards a common normal distribution with mean $\mu_t$ and variance $\tau_t^2$. For each study in the data (both current and historical), the covariance is user-defined. Options include:

1. Fully parameterized ("unstructured") with a separation strategy with the LKJ prior to model within-subject correlations among residuals.
1. AR(1) variances $\sigma_k$ and correlation $\rho_k$.
1. Diagonal with variances $\sigma_k$.

$$
\begin{aligned}
& y_k \sim \text{MVN}((X_\alpha)_k \cdot \alpha + (X_\delta)_k \cdot \delta + (X_\beta)_k \cdot \beta, \ I_{N_k} \otimes \Sigma_k ) \\
& \qquad \alpha_{kt} \stackrel{\text{ind}}{\sim} \text{Normal} (\mu_t, \tau_t^2) \\
& \qquad \qquad \mu_t \stackrel{\text{ind}}{\sim} \text{Normal}(0, s_\mu^2) \\
& \qquad \qquad \tau_t \stackrel{\text{ind}}{\sim} f_\tau \\
& \qquad \delta_{dt} \stackrel{\text{ind}}{\sim} \text{Normal} (0, s_\delta^2) \\
& \qquad \beta_{b} \stackrel{\text{ind}}{\sim} \text{Normal} (0, s_\beta^2) \\
& \qquad \Sigma_k = \left (I_T \sigma_k \right ) \Lambda_k \Lambda_k' \left (I_T \sigma_k \right ) \\
& \qquad \qquad \sigma_{k1}, \ldots, \sigma_{kT} \stackrel{\text{ind}}{\sim} \text{Uniform}(0, s_\sigma) \\
& \qquad \qquad \Lambda_k \Lambda_k' \sim \begin{cases}
  \text{LKJ}(\text{shape} = s_\lambda, \ \text{order} = T) && m_k = 1 \\
  \text{AR(1)}(T,\rho_k) && m_k = 2 \\
  I_T && m_k = 3 \\
\end{cases} \\
& \qquad \qquad \rho_k \stackrel{\text{ind}}{\sim} \text{Uniform}(-1, 1) \qquad (\text{only for } m_k = 2)
\end{aligned} 
$$

The prior $f_\tau$ on $\tau$ is critically important because:

1. It controls the prior amount of borrowing, and
2. The prior has a large influence if there are few historical studies in the data.

$f_\tau$ can either be a flexible half-Student-t distribution with $d_\tau$ degrees of freedom and scale parameter $s_\tau$:

$$
f_\tau = \text{Student-t}(0, s_\tau, d_\tau)^+
$$
or a uniform distribution with lower bound 0 and upper bound $s_\tau$:

$$
f_\tau = \text{Uniform}(0, s_\tau)
$$

Following the recommendation of @gelman2006, please use half-Student-t if the number of historical studies is small and consider uniform for large numbers of historical studies.

For the half-Student-t distribution, the role of the $s_\tau$ parameter is equivalent to the $\sigma$ parameter from the [Student-t parameterization in the Stan user manual](https://mc-stan.org/docs/functions-reference/unbounded_continuous_distributions.html#student-t-distribution).

## Independent model

Functions:

* `hbl_sim_independent()`
* `hbl_mcmc_independent()`

The independent model is the same as the hierarchical model, but with independent control group parameters $\alpha$. We use it as a no-borrowing benchmark to quantify the borrowing strength of the hierarchical model.

$$
\begin{aligned}
& y_k \sim \text{MVN}((X_\alpha)_k \cdot \alpha + (X_\delta)_k \cdot \delta + (X_\beta)_k \cdot \beta, \ I_{N_k} \otimes \Sigma_k ) \\
& \qquad \alpha_{kt} \stackrel{\text{ind}}{\sim} \text{Normal} (0, s_\alpha^2) \\
& \qquad \delta_{dt} \stackrel{\text{ind}}{\sim} \text{Normal} (0, s_\delta^2) \\
& \qquad \beta_{b} \stackrel{\text{ind}}{\sim} \text{Normal} (0, s_\beta^2) \\
& \qquad \Sigma_k = \left (I_T \sigma_k \right ) \Lambda_k \Lambda_k' \left (I_T \sigma_k \right ) \\
& \qquad \qquad \sigma_{k1}, \ldots, \sigma_{kT} \stackrel{\text{ind}}{\sim} \text{Uniform}(0, s_\sigma) \\
& \qquad \qquad \Lambda_k \Lambda_k' \sim \begin{cases}
  \text{LKJ}(\text{shape} = s_\lambda, \ \text{order} = T) && m_k = 1 \\
  \text{AR(1)}(T,\rho_k) && m_k = 2 \\
  I_T && m_k = 3 \\
\end{cases} \\
& \qquad \qquad \rho_k \stackrel{\text{ind}}{\sim} \text{Uniform}(-1, 1) \qquad (\text{only for } m_k = 2)
\end{aligned} 
$$


## Pooled model

Functions:

* `hbl_sim_pool()`
* `hbl_mcmc_pool()`

The pooled model is the same as the independent model, but with rep-specific control means pooled across studies. In other words $\alpha_{kt}$ loses the $k$ subscript, and we use a smaller matrix $\left (X_\alpha^{\text{pool}} \right )_k$ instead of $(X_\alpha)_k$. $\left (X_\alpha^{\text{pool}} \right )_k$ has fewer columns (rep-specific rather than study-by-rep-specific). Like the independent model, we use it as a no-borrowing benchmark to quantify the borrowing strength of the hierarchical model. 

$$
\begin{aligned}
& y_k \sim \text{MVN}((X_\alpha^{\text{pool}})_k \cdot \alpha + (X_\delta)_k \cdot \delta + (X_\beta)_k \cdot \beta, \ I_{N_k} \otimes \Sigma_k ) \\
& \qquad \alpha_{t} \stackrel{\text{ind}}{\sim} \text{Normal} (0, s_\alpha^2) \\
& \qquad \delta_{dt} \stackrel{\text{ind}}{\sim} \text{Normal} (0, s_\delta^2) \\
& \qquad \beta_{b} \stackrel{\text{ind}}{\sim} \text{Normal} (0, s_\beta^2) \\
& \qquad \Sigma_k = \left (I_T \sigma_k \right ) \Lambda_k \Lambda_k' \left (I_T \sigma_k \right ) \\
& \qquad \qquad \sigma_{k1}, \ldots, \sigma_{kT} \stackrel{\text{ind}}{\sim} \text{Uniform}(0, s_\sigma) \\
& \qquad \qquad \Lambda_k \Lambda_k' \sim \begin{cases}
  \text{LKJ}(\text{shape} = s_\lambda, \ \text{order} = T) && m_k = 1 \\
  \text{AR(1)}(T,\rho_k) && m_k = 2 \\
  I_T && m_k = 3 \\
\end{cases} \\
& \qquad \qquad \rho_k \stackrel{\text{ind}}{\sim} \text{Uniform}(-1, 1) \qquad (\text{only for } m_k = 2)
\end{aligned} 
$$

# Borrowing metrics

The package supports the following metrics to quantify borrowing. Various functions in `historicalborrowlong` compute each of the following metrics independently for each discrete time point ("rep").

## Effective sample size (ESS)

See the `hbl_ess()` function for an implementation.

@neuenschwander2010 posit a prior effective sample size metric for meta-analytic predictive (MAP) priors. In the original paper, the underlying hierarchical model only uses historical controls, and the hypothetical new study is the current study of interest. In `historicalborrow`, we adapt this metric to a hierarchical model which also includes both control and treatment data from the current study. We still define $N$ below to be the number of (non-missing) historical control patients so we can still interpret ESS on the same scale as in the paper.

For the pooled model, define $V_0$ to be the posterior predictive variance of the control mean $\alpha^*$ of a hypothetical new unobserved study. According to @neuenschwander2010, it can be derived as an average of study-specific variances. In practice, we estimate $V_0$ using the average of MCMC samples of $\frac{1}{\sum \sigma_i^{-2}}$.

$$
V_0 := \text{Var}(\alpha^* | y, \tau = 0) = \frac{1}{\sum \sigma_i^{-2}}
$$

For the hierarchical model, we define the analogous posterior predictive variance $V_\tau$ using the prior distribution 

$$
V_\tau := \text{Var}(\alpha^* | y) = \int E[(\alpha^* - E(\alpha^*|y))^2 | y] \cdot p(\alpha^* | \mu, \tau) \cdot p(\mu, \tau | y) d\mu d\tau
$$

The above integral implies a straightforward method of estimating $V_\tau$ using MCMC samples:

1. For each MCMC sample $m = 1, \ldots, M$ from the hierarchical model, identify samples $\mu^{(m)}$ and $\tau^{(m)}$ of $\mu$ and $\tau$, respectively.
2. Draw $(\alpha^*)^{m}$ from a Normal($\mu^{(m)}$, $(\tau^{(m)})^2$) distribution.
3. Estimate $V_\tau$ as the variance of the collection $(\alpha^*)^{1}, (\alpha^*)^{2}, \ldots, (\alpha^*)^{M}$ from (2).

Next, define $N$ as the number of non-missing control patients from the historical studies only. Given $N$, $V_0$, and $V_\tau$, define the effective sample size as:

$$
\text{ESS} := N \frac{V_0}{V_\tau}
$$

$\frac{V_0}{V_\tau}$ is a weight which quantifies the fraction of historical information that the hierarchical model leverages for borrowing. Notably, the weight should be 1 if the hierarchical and pooled model exhibit the same strength of borrowing. Multiplied by $N$, the quantity becomes a heuristic for the strength of borrowing of the hierarchical model, measured in terms of the number of historical patients.

## Precision ratio (hierarchical model only)

The precision ratio is an experimental ad hoc metric and should be used with caution. It is implemented in the `hbl_summary()` function for the hierarchical model.

The precision ratio compares the prior precision of a control mean response (an $\alpha$ component, numerator) to the analogous precision of the full conditional distribution (denominator). The former is $\frac{1}{\tau^2}$, and the latter is $\frac{1}{\tau^2} + \frac{n}{\sigma^2}$. Here, $n$ is the number of non-missing patients in the current study, $\sigma^2$ is the residual variance, and $\tau^2$ is the variance of study-specific control means (components of $\alpha$). The full precision ratio is:

$$
\begin{aligned}
\frac{\frac{1}{\tau^2}}{\frac{1}{\tau^2} + \frac{n}{\sigma^2}}
\end{aligned}
$$

The precision ratio comes from the conditional distribution of $\alpha_k$ in the hierarchical model given the other parameters and the data. More precisely, in this conditional distribution, the mean is a weighted average between the prior mean and data mean, and the precision ratio is the weight on the prior mean. This can be seen in a simpler case with a Bayesian model with a normal data model, a normal prior on the mean, and known constant variance. For details, see Chapter 2 of @bda3.

## Variance shift ratio

The variance shift ratio is an experimental ad hoc metric and should be used with caution. It is implemented in the legacy `hbl_metrics()` function.

Let $V_m$ be the estimated posterior variance of $\alpha_I$ (current study control group response mean) estimated by model $m$. The variance shift ratio is:

$$
\begin{aligned}
\frac{V_{m*} - V_{\text{independent}}}{V_{\text{pool}} - V_{\text{independent}}}
\end{aligned}
$$

where $m*$ is a historical borrowing model like the mixture model or hierarchical model.

## Mean shift ratio (legacy)

The mean shift ratio is not recommended to measure the strength of borrowing. Rather, it is an informal ad hoc measure of the lack of commensurability between the current and historical data sources. It is implemented in the legacy `hbl_metrics()` function.

To define the mean shift ratio, let $\theta_m$ be the posterior mean control group response estimated by model $m$. The mean shift ratio is:

$$
\begin{aligned}
\frac{\theta_{m*} - \theta_{\text{independent}}}{\theta_{\text{pool}} - \theta_{\text{independent}}}
\end{aligned}
$$

where $m*$ is a historical borrowing model like the mixture model or hierarchical model.

# References